supercomputing architecture course (sa-miri)


UPC Master in Innovation and Research in Informatics 

(specialization High Performance Computing)  

Official Course Web Site

Course starts on September 6th (Thursday) 2018

Course Description:

Supercomputers represent the leading edge in high performance computer technology. This course will describe all elements in the system architecture of a supercomputer, from the shared memory multiprocessor in the compute node, to the interconnection network and distributed memory cluster, including infrastructures that host them. We will also discuss the their building blocks and the system software stack, including their parallel programming models, exploiting parallelism is central for greater computational power . We will introduce the continuous development of supercomputing systems enabling its convergence with the advanced analytic algorithms required in today’s world. At this point, We will pay special attention to Deep Learning algorithms and its executions on a GPU platform. The practical component is the most important part of this subject. In this course the “learn by doing” method is used, with a set of Hands-on, based on problems that the students must carry out throughout the course. The course will be marked by continuous assessment which ensures constant and steady work.  The method is also based on teamwork and a ‘learn to learn’ approach reading and presenting papers. Thus the student is able to adapt and  anticipate new technologies that will arise in the coming years. For the Labs we will use supercomputing facilities from the Barcelona Supercomputing Center (BSC-CNS).

[CAT]Els supercomputadors són l’exponent de la tecnologia de computació d’alt rendiment. En aquest curs estudiarem tots els elements en l’arquitectura del sistema d’un supercomputador, des del multiprocessador de memòria compartida a la xarxa d’interconnexió i clúster de memòria distribuïda, incloent les infraestructures que allotgen aquests supercomputadors. També discutirem la pila de programari del sistema amb els models de programació paral·lela i les seves eines d’anàlisi de rendiment associats. Finalment parlarem de l’evolució d’aquests sistemes de supercomputació perquè permetin la convergència d’aquests amb la costosa analítica avançada que requereix el món actual. En aquest punt, pararem especial atenció els algoritmes de Deep Learning i la seva execució en les actuals plataformes amb GPUs. La component pràctica és la part més important d’aquesta assignatura. En aquest curs es fa servir el mètode de “learn by doing”, amb un conjunt de Hands-on basats en problemes reals que els estudiants han de dur a terme al llarg del curs. Es realitzarà una avaluació continuada al llarg del curs que no permet relaxació i que produeix millors resultats i major motivació entre els estudiants. El mètode es basa igualment en el treball en equip i que l’alumne ‘aprengui a aprendre’ mitjançant la lectura i presentació d’articles. D’aquesta manera l’estudiant serà capaç d’adaptar-se i anticipar-se a les tecnologies que arribaran en els propers anys. Per a la part pràctica farem servir recursos de supercomputació del Barcelona Supercomputing Center (BSC-CNS).

[ES]Los supercomputadores son el exponente de la tecnología de computación de alto rendimiento. En este curso estudiaremos todos los elementos en la arquitectura del sistema de un supercomputador, desde el multiprocesador de memoria compartida a la red de interconexión y clúster de memoria distribuida, incluyendo las infraestructuras que alojan estos supercomputadores. También discutiremos la pila de software del sistema con los modelos de programación paralela y sus herramientas de análisis de rendimiento asociados. Por último vamos a discutir la evolución de estos sistemas de supercomputación para que permitan la convergencia de estos con la compleja analítica avanzada que se requiere en el mundo actual. En este punto, vamos a parar especial atención a los algoritmos de Deep Learning y su ejecución en las plataformas de GPUs. La componente práctica es la parte más importante de esta asignatura. En este curso se utiliza el método de “learn by doing”, con un conjunto de Hands-on basados en problemas reales que los estudiantes deben llevar a cabo a lo largo del curso. Se realizará una evaluación continuada a lo largo del curso que no permite relajación y que produce mejores resultados y mayor motivación entre los estudiantes. El método se basa igualmente en el trabajo en equipo y que el alumno ‘aprenda a aprender’ mediante la lectura y presentacion de artículos. De esta manera el estudiante será capaz de adaptarse y anticiparse a las tecnologías que llegaran en los próximos años. Para la parte práctica usaremos recursos de supercomputación del Barcelona Supercomputing Center (BSC-CNS).


6.0 ECTS


Programming in C and Linux basics will be expected in the course. Prior exposure to parallel programming constructions, Python language, experience with linear algebra/matrices or machine learning knowledge, will be helpful.

Course workload: important warning

The student should be aware that SA-MIRI 2017 edition is a 6.0 ECTS course that require an effort from the student equivalent to 150 hours. This means more than 10 hours per week (4 hours in class + 6 hours outside class in average) during 14 weeks. It is not recommended to take this course if the student has other commitments during this quarter that will prevent to dedicate the required amount of hours to this course. You can wait for the next course edition.

Course Activities:

Class attendance and participation: Regular and consistent attendance is expected and to be able to discuss concepts covered during class.

Lab activities: Hands-on sessions will be conducted during lab sessions using supercomputing facilities. Each hands-on will involve writing a lab report with all the results to be delivered one week later.

Homework Assignments: Homework will be assigned weekly that includes reading the documentation that expands the concepts introduced during lectures, and periodically will include reading research papers related with the lecture of the week, and prepare presentations (with slides).

Assessment: There will be 2 or 3 short midterm exams (and could be some pop quiz) along the course.

Student presentation: Students/groups randomly chosen will present the homework (presentations/projects).

Grading Procedure

The evaluation of this course will take into account different items:

  • Attendance (minimum 80% required) & participation in class will account for 15% of the grade.
  • Homework, papers reading, paper presentations, will account for 15%of the grade.
  • Exams will account for 15% of the grade.
  • Lab sessions (+ Lab reports) will account for 55%of the grade.

Tentative theoretical course content (*):

  1. Course content and motivation
  2. Supercomputing Basics
  3. HPC Building Blocks (general purpose blocks)
  4. HPC Software Stack (general purpose blocks)
  5. Parallel Programming Models: OpenMP
  6. Parallel Programming Models: MPI
  7. Parallel Performance Metrics and Measurements
  8. HPC Building Blocks for AI servers
  9. Coprocessors and Programming Models
  10. Powering Artificial Intelligence, Machine Learning and Deep Learning with Supercomputing
  11. Parallel AI platforms and its software stack
  12. Distributed AI platforms and its software stack
  13. Conclusions and remarks: Towards Exascale Computing

(*) The different background of students is a major difficulty to teach SA-MIRI. A “SA-MIRI Entry Survey“ is designed to help assess students background (PA-MIRI, CPDS-MIRI, MA-MIRI, CHPC-MIRI, PD-MIRI, SCA-MIRI, PPTM-MIRI, APA-MIRI), expectations, and preferences in order to better customize the course.

Tentative Labs:

In this course, the students will use supercomputers from Barcelona Supercomputing Center.

  1. Supercomputing Building Blocks: Marenostrum visit
  2. Getting Started with Supercomputing
  3. Getting Started with Parallel Programming Models
  4. Getting Started with Parallel Performance Metrics
  5. Getting Started with Parallel Performance Model – I
  6. Getting Started with Parallel Performance Model – II
  7. Getting Started with GPU based Supercomputing
  8. Getting Started with CUDA programming model
  9. Getting Started with Deep Learning Frameworks in a Supercomputer
  10. Getting Started with Deep Learning basic model
  11. Getting Started with a Deep Learning real problems and its solutions
  12. Getting Started with parallelization of a Deep Learning problems
  13. Getting Started with a distributed Deep Learning problems

Professor : 

Jordi Torres
Office: Mòdul C6- 217 (second floor)

Office Hours: 
By appointment

Additional documentation:

Class handouts and materials associated with this class can be found on the Racó-FIB web server


This syllabus will be revised/updated until the start of the course (last modified 01/Sep/2018)