TensorFlow vs PyTorch: The battle continues

This week Facebook announced the release of PyTorch 1.5

Photo by @evanthewise on Unsplash


Last week we answered the question Which is the best framework for programming Deep Learning networks?. And today? With the new PyTorch 1.5 released by Facebook which includes several projects that are based on the collaboration between Facebook and AWS?

Leading deep learning frameworks


There are many deep learning frameworks available today, however, TensorFlow (developed by Google and released as open source on November 2015) is the most used for the industry right now.

A few weeks ago TensorFlow posted a release candidate of TensorFlow 2.2. From this release candidate I would highlight the announced improvements of performance and the introduction of new tools to help measure performance, like the new Performance Profiler. They also have increased the compatibility of the TensorFlow ecosystem, including key libraries like TensorFlow Extended.


Three years ago appeared the first version of PyTorch and without question, it is gaining great momentum. Initially incubated by Facebook, PyTorch rapidly developed a reputation from being an ideal flexible framework for rapid experimentation and prototyping gaining thousands of fans within the deep learning community. For instance, PhD students in my research team chose to use PyTorch because of its simplicity. It allows to them to write native looking Python code and still get all the benefits of a good framework like auto differentiation and built-in optimization.

The clear leaders in deep learning frameworks arena are now the Google-developed TensorFlow and the Facebook-developed PyTorch, and they’re pulling away from the rest of the market in usage, share, and momentum.

Model Deployment

However, building and training models is only half the story. Deploying and managing models in production is often also a difficult part, for instance, building bespoke prediction APIs and scaling them.

One way to tackle the model deployment process is to use a model server in order to make it easy to load one or several models, automatically creating a prediction API backed by a scalable web server. Until now, deployability in production environments was still TensorFlow’s strength with TensorFlow Serving being the most popular model server.

As we have already advanced, this week Facebook announced the released of PyTorch 1.5. The new version focuses on providing tools and frameworks to make PyTorch workflows production-ready. The most remarkable aspect of this release for this post has been the collaboration between AWS and Facebook in the project TorchServe, an open source model server for PyTorch.

According to the AWS Blog, some customers are already enjoying the benefits of TorchServe. Toyota Research Institute Advanced Development, Inc. is developing software for automated driving at Toyota Motor Corporation. Or Matroid, a maker of computer vision software that detects objects and events in video footage.

With the unstoppable rise of PyTorch TensorFlow’s Deep Learning Dominance may be waning.

PyTorch 1.5 release

Obviously this new PyTorch release includes more features. The highlights are the updated packages for torch_xla, torchaudio, torchvision, torchtext and the new libraries for integration with TorchElastic. Let me make a brief summary of them:

  • TorchElastic is a library for training large scale deep neural networks at scale with the ability to dynamically adapt to server availability. In this release, AWS and Facebook collaborated in expanding TorchElastic’s capabilities by integrating it with Kubernetes in the form of the TorchElastic Controller for Kubernetes. To learn more see the TorchElastic repo.
  • torch_xla is a Python package that uses the XLA linear algebra compiler to accelerate the PyTorch deep learning framework on Cloud TPUs and Cloud TPU Pods. torch_xla aims to give PyTorch users the ability to do everything they can do on GPUs on Cloud TPUs as well while minimizing changes to the user experience. Full docs and tutorials can be found here and here.
  • The torchvision 0.6 release includes updates to datasets, models and a significant number of bug fixes. Full docs can be found here.
  • The torchaudio 0.5 release includes new transforms, functionals, and datasets. See the release full notes here.
  • The torchtext 0.6 release includes a number of bug fixes, improvements to documentation and based on user’s feedback, dataset abstractions are currently being redesigned also. Full docs can be found here.
  • This release includes important core features as significant update to the C++ frontend or a stable release of the distributed RPC framework used for model-parallel training. The release also has an API that allows the creation of Custom C++ Classes. You can find the detailed release notes here.

The PyTorch 1.5 release hints that the AWS-Facebook collaboration could be a first step towards making AWS the preferred cloud runtime for running PyTorch programs.

To sum up

Though PyTorch has gained momentum in the marketplace thanks to Facebook (and AWS), TensorFlow continues to be ahead in all aspects, as evidenced for example by having already launched a certification program.

Google continues to make significant investments in strengthen its TensorFlow platform stack. But what remains to be seen is whether Facebook will continue to invest in PyTorch at the same rhythm in order to keep it at least at functional parity with TensorFlow.

Going forward, the feature gaps between those frameworks will continue to diminish, as we already discussed in this previous post.

UPDATE 29/05/2021: This week we saw the announcement of the PyTorch Enterprise Support program enabled by a partnership between Microsoft and Facebook, prividing support for enterprise users building production applications in PyTorch. As part of the program, Microsoft announced the release of PyTorch Enterprise on Microsoft Azure.