supercomputERS architecture course (sa-miri)


UPC Master in Innovation and Research in Informatics 

(specialization High Performance Computing)  

Official Course Web Site

The course starts on September 16th (Wednesday) 2020


  • Mask: Obligation to wear the mask (hygienic (the UPC is) or surgical) at all times, for all activities and in all spaces.
  • Hand hygiene: Obligation to wash hands when entering and leaving buildings, classrooms, laboratories. In addition to the toilets, there are hydroalcoholic gel dispensers in the accesses to all these spaces.
  • Safety distance: Maintain at all times a distance of 1.5 meters (2 meters, in the case of the teacher). If this is not possible, the use of FFP2 masks is recommended.
  • In the classrooms: they will work at 50% of their capacity. Unused seats are marked with a red cross. Teachers are asked to avoid walking around the classroom space while teaching the class and to always maintain a minimum distance of 2 meters with the first row of students. The UPC also recommends that, as far as possible, the teacher be the last person to enter the classroom once the students have taken their place and the first person to leave the classroom at the end of teaching.
  • Traceability of close contacts: A record must be kept of which students have been close to the classroom. Until we have a tool that automates this procedure, teachers must maintain a list of the place that students have occupied in the classroom or laboratory.
  • In the circulation inside the buildings: to follow the marked flows of circulation, avoiding agglomerations. On the stairs, follow the signs for the ascent and descent routes. 

Class Days/Time:  Monday/Wednesday 8:00-10:00

Course Description:

Supercomputers represent the leading edge in high performance computer technology. This course will describe all elements in the system architecture of a supercomputer, from the shared memory multiprocessor in the compute node, to the interconnection network and distributed memory cluster, including infrastructures that host them. We will also discuss the their building blocks and the system software stack, including their parallel programming models, exploiting parallelism is central for greater computational power . We will introduce the continuous development of supercomputing systems enabling its convergence with the advanced analytic algorithms required in today’s world. At this point, We will pay special attention to Deep Learning algorithms and its executions on a GPU platform. The practical component is the most important part of this subject. In this course the “learn by doing” method is used, with a set of Hands-on, based on problems that the students must carry out throughout the course. The course will be marked by continuous assessment which ensures constant and steady work.  The method is also based on teamwork and a ‘learn to learn’ approach reading and presenting papers. Thus the student is able to adapt and  anticipate new technologies that will arise in the coming years. For the Labs we will use supercomputing facilities from the Barcelona Supercomputing Center (BSC-CNS).

[CAT]Els supercomputadors són l’exponent de la tecnologia de computació d’alt rendiment. En aquest curs estudiarem tots els elements en l’arquitectura del sistema d’un supercomputador, des del multiprocessador de memòria compartida a la xarxa d’interconnexió i clúster de memòria distribuïda, incloent les infraestructures que allotgen aquests supercomputadors. També discutirem la pila de programari del sistema amb els models de programació paral·lela i les seves eines d’anàlisi de rendiment associats. Finalment parlarem de l’evolució d’aquests sistemes de supercomputació perquè permetin la convergència d’aquests amb la costosa analítica avançada que requereix el món actual. En aquest punt, pararem especial atenció els algoritmes de Deep Learning i la seva execució en les actuals plataformes amb GPUs. La component pràctica és la part més important d’aquesta assignatura. En aquest curs es fa servir el mètode de “learn by doing”, amb un conjunt de Hands-on basats en problemes reals que els estudiants han de dur a terme al llarg del curs. Es realitzarà una avaluació continuada al llarg del curs que no permet relaxació i que produeix millors resultats i major motivació entre els estudiants. El mètode es basa igualment en el treball en equip i que l’alumne ‘aprengui a aprendre’ mitjançant la lectura i presentació d’articles. D’aquesta manera l’estudiant serà capaç d’adaptar-se i anticipar-se a les tecnologies que arribaran en els propers anys. Per a la part pràctica farem servir recursos de supercomputació del Barcelona Supercomputing Center (BSC-CNS).

[ES]Los supercomputadores son el exponente de la tecnología de computación de alto rendimiento. En este curso estudiaremos todos los elementos en la arquitectura del sistema de un supercomputador, desde el multiprocesador de memoria compartida a la red de interconexión y clúster de memoria distribuida, incluyendo las infraestructuras que alojan estos supercomputadores. También discutiremos la pila de software del sistema con los modelos de programación paralela y sus herramientas de análisis de rendimiento asociados. Por último vamos a discutir la evolución de estos sistemas de supercomputación para que permitan la convergencia de estos con la compleja analítica avanzada que se requiere en el mundo actual. En este punto, vamos a parar especial atención a los algoritmos de Deep Learning y su ejecución en las plataformas de GPUs. La componente práctica es la parte más importante de esta asignatura. En este curso se utiliza el método de “learn by doing”, con un conjunto de Hands-on basados en problemas reales que los estudiantes deben llevar a cabo a lo largo del curso. Se realizará una evaluación continuada a lo largo del curso que no permite relajación y que produce mejores resultados y mayor motivación entre los estudiantes. El método se basa igualmente en el trabajo en equipo y que el alumno ‘aprenda a aprender’ mediante la lectura y presentacion de artículos. De esta manera el estudiante será capaz de adaptarse y anticiparse a las tecnologías que llegaran en los próximos años. Para la parte práctica usaremos recursos de supercomputación del Barcelona Supercomputing Center (BSC-CNS).


6.0 ECTS (an effort from the student equivalent to 150 hours. This means more than 10 hours per week (4 hours in face to face + 6 hours outside class on average) during approx. 14 weeks).


Programming in C and Linux basics will be expected in the course. Prior exposure to parallel programming constructions, Python language, experience with linear algebra/matrices or machine learning knowledge, will be helpful.

Course Activities:

Class attendance and participation: Regular and consistent attendance is expected and to be able to discuss concepts covered during class.

Lab activities: Hands-on sessions will be conducted during the course using supercomputing facilities. Each hands-on will involve writing a lab report with all the results.

Reading/Presentation Assignments: 6 assignments that includes reading documentation/papers that expands the concepts introduced during lectures.

Assessment: There will be 2 short midterm exams along the course (and some pop quiz that can be used to replace attendance if due the situation it is required) .

Student presentation: Students/groups randomly chosen will present the reading assingment (presentations/projects).

Grading Procedure  

The evaluation of this course will take into account different items (tentative):

  • Attendance (minimum 80% required) & participation in class will account for 20% of the grade.
  • Readings, Presentations (and Homework) will account for 20 of the grade.
  • Exams will account for 20% of the grade.
  • Lab sessions (+ Lab reports) will account for 40 of the grade.

Tentative theoretical course content (*):


0. Welcome
1. Supercomputers Basics
2. Supercomputers Architecture
3. Supercomputers Benchmarking
4. General Purpose Supercomputers
5. Resource Management in Supercomputers


6. Parallel Programming Models and Motivation
7. MPI basics
8. Taking Time
9. OpenMP basics
10. MPI Advanced
11. Parallel Performance


12. Heterogeneous Supercomputers
13. Accelerator Architecture
14. Getting Started with CUDA Programming Model

BLOCK 4: SUPERCOMPUTING FOR Artificial Intelligence (Deep Learning)

15. Supercomputing, the heart of Deep Learning
16. Software Stack for Artificial Intelligence
17. Deep Learning Basics Concepts
18. Computing Performance
19. Training on Multiple GPUs
20.Training on Multiple Servers


(*) The different background of students is a major difficulty to teach SA-MIRI. A “SA-MIRI Entry Survey“ is designed to help assess students’ background (PA-MIRI, CPDS-MIRI, MA-MIRI, CHPC-MIRI, PD-MIRI, SCA-MIRI, PPTM-MIRI, APA-MIRI), expectations, and preferences in order to better customize the course.

Tentative Labs:

  1. Getting Started with Supercomputing
  2. Getting Started with Parallel Programming Models
  3. Getting Started with Parallel Performance Metrics
  4. Getting Started with CUDA Programming Model
  5. Getting Started with Deep Learning Basic Model
  6. Getting Started with a Deep Learning Computational Performance
  7. Getting Started with Parallelization of a Deep Learning with Multiple GPUs
  8. Getting Started with deployment of Deep Learning in a Supercomputer
  9. Getting Started with a distributed Deep Learning Training process
  10. Getting Started with the Scalability of Deep Learning Training process

Tentative Reading and presentations:

  1. Exascale Computing
  2. Basic Supercomputing Performance Metrics
  3. Basic Supercomputing Performance Models
  4. Computer Performance after Moore’s Law
  5. Domain-Specific Supercomputers for Deep Learning
  6. Getting Started with CUDA Programming Model

Professor : 

Jordi Torres
Office: Mòdul C6- 217 (second floor)

Office Hours: 
By appointment

Additional documentation:


This syllabus will be revised/updated until the start of the course (last modified 01/Jul/2020)