How to Take Advantage of the New Disruptive AI Technology Called Transformers

Starting in 2017, Transformers have facilitated impressive progress in the field of deep learning. Many of us consider Transformers to be the most important development in recent years and with the greatest potential in the area. For this reason, I believe that it is worthwhile for us to be watchful of their progress.

Transformers were introduced in the seminal paper «Attention is all you need» by Vaswani et al. The gist of this paper is to introduce a mechanism called «neural attention», which has quickly become one of the most influential ideas in deep learning applied to the NLP domain. But also, the same attention mechanisms that make Transformers so effective for language models can be used in other domains, and nowadays, Transformers have started to find tremendous success in areas such as computer vision.

One of the advantages of Transformers is their capability to learn without the need for labeled data. For example, the Transformers can develop representations through unsupervised learning. Then they can apply those representations to fill in the blanks in incomplete sentences or to generate coherent text after receiving a prompt.

However, the training of Transformers and their application remains a privilege of the big technology companies with access to vast data sources and compute resources. For example, the popular OpenAI’s GPT-3 model costs around 10 million dollars to train, an amount of money that is challenging to assume for most companies.

Making Transformers more accessible to mainstream deep learning applications is one of the most exciting challenges of the next few years in practical deep learning. In this sense, the community can take advantage of what we know as Transfer Learning. Transfer Learning consists of training a machine learning model for a task and using the knowledge gained for a different but related task. Nowadays, the artificial intelligence research and engineering community are highly collaborative and eager to help each other. This has led to numerous datasets and research work as well as current models being made open source by top research groups so others can build on them [1].

The best example of this is Hugging Face, which has become a popular platform that provides state-of-the-art Transformers models using the power of transfer learning, offering an open-source library to build, train, and share Transformers models. In short, this shows that there has been a paradigm shift in the last few years, whereby transfer learning for Transformers started to dramatically change the accessibility of integrating research models into business applications, a new step to democratize the use of this technology. We have to be vigilant on this subject.


  1. In our research group at the BSC / UPC, we have been working on Transformers since mid-2019. Released in this paper: Ferrando, J ; Dominguez, J.; Torres J.; García, R. García D, Garrido, D.; Cortada J.; Valero, M. Improving Accuracy and speeding up document image classification though parallel systems. International Conference on Computational Science, ICCS June 2020.