Supercomputers Architecture for Artificial Intelligence
UPC Master in Innovation and Research in Informatics
(specialization High Performance Computing)
Official Course Web Site
This syllabus will be revised/updated until the start of the course (last modified 04/09/2022)
Course Description (2022 edition):
This course introduces the fundamentals of high-performance and parallel computing. It is targeted at scientists and engineers seeking to develop the skills necessary for working with supercomputers, the leading edge in high-performance computing technology.
In the first part of the course, we will cover the basic building blocks of supercomputers and their system software stack. Then, we will introduce their traditional parallel and distributed programming models, which allow one to exploit parallelism, a central element for scaling the applications in these types of high-performance infrastructures.
In the second part of the course, we will motivate the current supercomputing systems developed to support artificial intelligence algorithms required in today’s world. This year’s syllabus will pay special attention to Deep Learning (DL) algorithms and their scalability using a GPU platform.
This course uses the “learn by doing” approach, based on a set of exercises, made up of programming problems and reading papers, that the students must carry out throughout the course. The course will be marked by a continuous assessment, which ensures constant, steady work.
All in all, this course seeks to enable students to acquire practical skills that can help them as much as possible to adapt and anticipate the new technologies that will undoubtedly emerge in the coming years. For the practical part of the exercises, the student will use supercomputing facilities from the Barcelona Supercomputing Center (BSC-CNS).
Important warning about course workload and attendance
The student should be aware that the SA-MIRI 2021 edition is a course that requires an effort from the student equivalent to 6.0 ECTS. Therefore, this course is not recommended for students who have other commitments during the term that prevent them from dedicating the required amount of hours for this course.
Regular and consistent attendance is mandatory unless you have a reason to miss a class occasionally that is acceptable to the instructor, for instance, for health reasons or visa matters/issues in the case of international students. Missing classes due to attending other courses, such as PATC courses from BSC-CNS, will not be accepted by the instructor. If you expect to miss any classes for some reason of this kind, you should wait to enroll in the next edition of the course.
Programming in C and Linux basics will be expected in the course. In addition, prior exposure to parallel programming constructions, Python language, experience with linear algebra/matrices, or machine learning knowledge will be helpful.
Class attendance and participation: Regular attendance is expected, and is required to be able to discuss concepts that will be covered during class.
Lab activities: Some exercises will be conducted as hands-on sessions during the course using supercomputing facilities. The student’s own laptop will be required to access these resources during the theory class. Each hands-on session will involve writing a lab report with all the results. There are no days for theory classes and days for laboratory classes. Theoretical and practical activities will be interspersed during the same session to facilitate the learning process.
Reading/presentation assignments: Some exercise assignments will consist of reading documentation/papers that expand the concepts introduced during lectures. Some exercises will involve student presentations (randomly chosen).
Assessment: There will be one midterm exam in the middle of the course. The student is allowed to use any type of documentation (also digital via the student’s laptop).
The evaluation of this course can be obtained by continuous assessment. This assessment will take into account the following:
- 20% Attendance + participation
- 15% Midterm exam
- 65% Exercises (+ exercise presentations) and Lab exercises (+ Lab reports)
Details of the weight of each component of the course in the grade are described in the tentative scheduling section.
Course Exam: For those students who have not benefited from the continuous assessment, a course exam will be announced during the course. This exam includes evaluating the knowledge of the entire course (practical part, theoretical part, and self-learning part). During this exam, the student is not allowed to use any documentation (neither on paper nor digital).
Tentative course content (topics):
PART 1: HOW BSC SUPERCOMPUTERS ARE AND HOW TO PROGRAM THEM
- Supercomputing basics
- General purpose of supercomputers
Parallel programming models (*)
Parallel performance metrics (*)
Parallel performance models
- Heterogeneous supercomputers
- Parallel programming languages for heterogeneous platforms
- Emerging trends and challenges in supercomputing
PART 2: HOW MODERN SUPERCOMPUTERS CAN BE USED TO ACCELERATE DL TRAINING
- Artificial Intelligent is a sucomputing problem
- Deep Learning essential concepts (*)
- Using supercomputers for training Deep Learning models
- Accelerate the learning with parallel training using a multi-GPU parallel server
- Accelerate the learning with distributed training using multiple parallel servers
- How to speed up the training of Transformers-based models
Each topic ends with an exercise.
(*) The course includes a complete overview of the area including all topics. But due to the heterogeneity of the student’s backgrounds, in some topics of this course, acquired knowledge in previous courses can be validated by the teacher (and class attendance is not required).
Tentative list of exercises:
- Exercise 01: Read and present a paper about Exascale Computers Challenges
- Exercise 02: Getting Started with Supercomputing
- Exercise 03: Getting Started with Parallel Programming Models
- Exercise 04: Getting Started with Parallel Performance Metrics
- Exercise 05: Getting started with parallel performance metrics and models
- Exercise 06: Comparing Supercomputers Performance
- Exercise 07: Getting Started with CUDA
- Exercise 08: Read and present a paper about Emerging Trends in Supercomputing
- Exercise 09: First contact with deep learning
- Exercise 10: The 60th edition of the TOP500 (Nov 2022| Dallas – USA)
- Exercise 11: Using supercomputers for training Deep Learning models
- Exercise 12: Accelerate the learning with parallel training using a multi-GPU parallel server
- Exercise 13: Accelerate the learning with distributed training using multiple parallel server
- Exercise 14: How to speed up the training of Transformers-based model
Office: Mòdul C6- 217 (second floor)
- Class handouts and materials associated with this class (can be found on the Racó-FIB web server)
- Thomas Sterling, Matthew Anderson, Maciej Brodowicz. High Performance Computing: Modern Systems and Practices. Morgan Kaufmann, 2018. (preview in google) (available at the library of the UPC Barcelona Tech)
- BSC documentation about Marenostrum 4 and CTE-Power
- First Contact with Deep Learning. Jordi Torres. WATCH THIS SPACE book collection, 2018
- Dive into Deep Learning,Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola. 2020.