[ONLINE] Efficient multi-GPU and multi-node execution of AI applications and frameworks on the GPU nodes of Karolina supercomputer (EuroCC)

UTC
Zoom

Zoom

Description

This half-day course is dedicated to learning how to efficiently use the GPU accelerated part of Karolina for Deep and Machine Learning.

Technical details of the GPU partition of Karolina supercomputer

The accelerated part consists of 72 servers and each of them is equipped with 8 GPU accelerators providing a performance of 11 PFlop/s for standard HPC simulations and up to 180 PFlop/s for artificial intelligence computations.

72 compute nodes with 2x AMD Zen 2 EPYC™ 7763 processors with 64 cores and 2.45 GHz and 8x NVIDIA A100 GPU accelerators, 40 GB HBM2.

Tutors

Mgr. Branislav Jansík, Ph.D. 

Georg Zitzlsberger

Ing. Stanislav Böhm, Ph.D.

Prerequisites

Experience with using GPU accelerated systems.

Language

English

Schedule

Access to Karolina's GPU accelerated part

Branislav Jansík (60 minutes)

  1. Short introduction of the Karolina supercomputer 
  2. How to access the Karolina GPU nodes
  3. First login
  4. Computing environment and available software libraries and tools
  5. HPC resources allocation, PBS
  6. Scratch and Project storages
  7. Special tools (Nodes availability overview, ...)

Efficient multi-GPU and multi-node execution of Deep and Machine Learning frameworks

Georg Zitzlsberger (60 minutes)

  1. Introduction to Data Parallel Deep Learning with Horovod
  2. Multi-node/-GPU aware Data Processing Pipelines
  3. Demonstration of Multi-node/-GPU Examples using Tensorflow
  4. Multi-node/-GPU Machine Learning with scikit-learn

Introduction to HyperQueue

Stanislav Böhm (45 minutes)

  1. Efficient execution of a large number of small tasks transparently over HPC schedulers (SLURM/PBS) using HyperQueue
  2. Guided examples

 

Acknowledgements

                 

 

       

This event is partially supported by the EuroCC project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Germany, Bulgaria, Austria, Croatia, Cyprus, the Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, the United Kingdom, France, the Netherlands, Belgium, Luxembourg, Slovakia, Norway, Switzerland, Turkey, Republic of North Macedonia, Iceland, Montenegro.

This event is partially supported by the LIGATE project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 956137. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and Italy, Sweden, Austria, the Czech Republic, Switzerland.

This course is supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90140). 

If you have any questions please contact us at:
    • 13:00 14:00
      Access to the Karolina GPU accelerated part
      1. Short introduction of the Karolina supercomputer
      2. How to access the Karolina GPU nodes
      3. First login
      4. Computing environment and available software libraries and tools
      5. HPC resources allocation, PBS
      6. Scratch and Project storages
      7. Special tools (Nodes availability overview, ...)
      8. How to run jobs
      Convener: Branislav Jansik (IT4Innovations)
    • 14:00 14:15
      Break 15m
    • 14:15 15:15
      Efficient multi-GPU and multi-node execution of Deep and Machine Learning frameworks
      1. Introduction to Data Parallel Deep Learning with Horovod
      2. Multi-node/-GPU aware Data Processing PipelinesLive
      3. Demonstration of Multi-node/-GPU Examples using Tensorflow
      4. Multi-node/-GPU Machine Learning with scikit-learn
      Convener: Georg Zitzlsberger (IT4Innovations)
    • 15:15 16:00
      Introduction of HyperQueue
      1. Efficient execution of large number of small tasks using HyperQueue and independently of schedulers (SLURM/PBS).
      2. Guided examples
      Convener: Stanislav Böhm (IT4Innovations)