Modern deep learning challenges leverage increasingly larger datasets and more complex models. As a result, significant computational power is required to train models effectively and efficiently. Learning to distribute data across multiple GPUs during training makes possible an incredible wealth of new applications that utilize deep learning.

Effectively using systems with multiple GPUs also reduces training time, allowing for faster application development and much faster iteration cycles. Teams who can train with multiple GPUs have an edge, building models trained on more data in shorter periods and with greater engineer productivity.

This workshop teaches you techniques for data-parallel deep learning training on multiple GPUs to shorten the training time required for data-intensive applications. Working with deep learning tools, frameworks, and workflows to perform neural network training, you’ll learn how to decrease model training time by distributing data to multiple GPUs while retaining the accuracy of training on a single GPU.

In this workshop, attendees will learn how to:

  • Perform data-parallel deep learning training with multiple GPUs
  • Achieve maximum throughput when training for the best use of multiple GPUs
  • Distribute training to multiple GPUs using PyTorch Distributed Data Parallel (DDP)
  • Understand and utilize algorithmic considerations specific to multi-GPU training performance and accuracy


Tools, libraries, and frameworks: PyTorch, PyTorch Distributed Data Parallel, NVIDIA Collective Communications Library (NCCL)






NVIDIA developer account is needed prior to the event. Please see the section "Practicalities" below.

Hardware requirements: Desktop or laptop computer capable of running the latest version of Chrome or Firefox. Each participant will be provided with dedicated access to a fully configured, GPU-accelerated workstation in the cloud.


Georg Zitzlsberger is a research specialist for Machine and Deep Learning at IT4Innovations. For over four years he has been certified by NVIDIA as a University Ambassador of the NVIDIA Deep Learning Institute (DLI) program. This certification allows him to offer NVIDIA DLI courses to users of IT4Innovations' HPC services. In addition, in collaboration with Bayncore, he was a trainer for Intel HPC and AI workshops and conferences carried out across Europe. He has been contributing to these events, which are held for audiences from industry and academia, for five years. Recently, he also received instructor certifications from Intel for oneAPI related courses.




This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101101903. The JU receives support from the Digital Europe Programme and Germany, Bulgaria, Austria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, France, Netherlands, Belgium, Luxembourg, Slovakia, Norway, Türkiye, Republic of North Macedonia, Iceland, Montenegro, Serbia. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic.

This course was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254).



Registration is obligatory. Only registered participants will receive the Zoom link. This training is offered only to members from academia.

After the number of registrations has reached its maximum or the registration form has been closed, you may want to send us an email stating that you are interested to be put on the waiting list. (Vacancies may occur due to cancellations, etc.) E-mail to


This training will be a hybrid event. Technical details about joining will be sent to the accepted registrants before the event. 

Before the workshop please create an NVIDIA developer account under using the same email address as for event registration.

The recommended browser for the course is a recent version of Chrome. Please ensure your laptop will run smoothly by going to Make sure that WebSockets work for you by seeing under Environment, WebSockets is supported and Data Receive, Send and Echo Test all check Yes under WebSockets (Port 80). If there are issues with WebSockets, try updating your browser.

Capacity and Fees

The capacity is limited to 30 participants combined online and onsite.

The course is free of charge for participants from academia. Please make sure you register using your University/Institute e-mail so that we can check your eligibility. In case you are from industry please contact us so that we can monitor the interest.