Without the Rise of Supercomputers There Will Be No Progress in Artificial Intelligence

We need solutions that overlap these two research worlds

Although artificial intelligence (AI) research began in the 1950s, and the fundamental algorithms of neural networks were already known in the last century, this discipline was left dormant for many years due to the lack of results that could excite both academia and the industry.

Artificial Intelligence has been around since the middle of the last century. John McCarthy coined the term Artificial Intelligence in the 1950s, being one of the founding fathers of Artificial Intelligence along with Marvin Minsky. Also, in 1958, Frank Rosenblatt built a prototype neuronal network, which he called the Perceptron. since then, Artificial Intelligence has experienced several waves of optimism, followed by disappointment and loss of funding and interest (periods known as AI winter), followed by new approaches, success and financing.

It was not until 2012 that advances already established in supercomputing revived this area of research again. 

In 2012 Alex Krizhevsky, Ilya Sutskever and Geoffrey E. Hilton used for the first time hardware accelerators GPU, which was already used at that time in supercomputing centers like ours in Barcelona to increase the speed of execution of applications that require the performance of many calculations.

A new era of AI that merges key mathematical ideas with supercomputing knowledge, thus achieving the creation of new large models of AI.

Starting in 2017, Transformers have facilitated impressive progress in the field of deep learning.

But nowadays, training these new AI large models requires massive amounts of computation and execution time. And that has become a significant challenge as it requires great expertise in supercomputing knowledge to scale up large AI models in today’s parallel and distributed infrastructures.

A recent paper from Google presents a model for Multilingual translation quality with computing requirements equivalent to 22 years with 1 TPU (if we only had available one TPU, it would take us 22 years to do the training). In this paper, Google distributed the training over 2048 TPUs and achieved results in only 4 days.

In summary, the AI ​​revolution is not only about new mathematical models; it’s about how to take advantage of the unprecedented opportunities that supercomputing offers for next-generation AI methods. 

A graphic from OpenAI that has become very popular shows that since 2012, the amount of computation required (or available) to generate artificial intelligence models has increased exponentially with a 3.4x month doubling time.

And this is what motivates courses such as SA in the MIRI Master’s degree taught at the UPC Barcelona Tech in collaboration with the research group Emerging Technologies for AI at BSC. 😉