[ONLINE] Python for HPC

Europe/Prague
ZOOM (ONLINE)

ZOOM

ONLINE

Description

Annotation

This training introduces participants to Python for high-performance computing, covering parallel programming, performance optimization, and HPC resource utilization. Designed for researchers and developers, the course includes hands-on sessions to enhance practical skills.

Target Audience and Purpose of the Course:

  • Python's role in HPC and performance optimization
  • Parallel programming techniques for efficient computing
  • How to utilize HPC resources effectively
  • Hands-on experience with lab exercises for practical skills

Participants will have access to the Karolina supercomputer for hands-on sessions, utilizing both CPU and GPU resources. Karolina, operational since 2021, is the most powerful supercomputer in the Czech Republic and ranks among Europe's top systems. It features a standard part with 720 nodes, delivering 11.6 PFlop/s for traditional HPC simulations, and an accelerated section comprising 72 servers, each equipped with 8 GPU accelerators, achieving up to 360 PFlop/s for AI computations. 

This infrastructure supports complex scientific and industrial challenges, including numerical simulations, data analysis, and artificial intelligence applications.

Level

70% beginner, 30% intermediate

Language

English

Prerequisites

beginner experience with programming in Python

Technical requirements: 

  • Python and it’s dependencies
  • Jupyter Notebook for interactive coding
  • Anaconda (optional) for managing dependencies

Tutors

Tomas Martinovic is a senior researcher at the Advanced Data Analysis and Simulation Laboratory within the IT4Innovations National Supercomputing Center. His work primarily focuses on the data science, data visualisation, and mathematical modeling leveraging statistical methods and deep neural networks.

Ghaith Chaabane Researcher at the Advanced Data Analysis and Simulation Laboratory within the IT4Innovations National Supercomputing Center.

Acknowledgements

 

This course is supported by the EXA4MIND project - the European Union’s Horizon Europe research and innovation programme under grant agreement No. 101092944.

 

EuroCC 2 project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 101101903. The JU receives support from the Digital Europe Programme and Germany, Bulgaria, Austria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, Greece, Hungary, Ireland, Italy, Lithuania, Latvia, Poland, Portugal, Romania, Slovenia, Spain, Sweden, France, Netherlands, Belgium, Luxembourg, Slovakia, Norway, Türkiye, Republic of North Macedonia, Iceland, Montenegro, Serbia. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic.

 

This course was supported by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254).

Some of the materials used in this course were originally developed by the Vienna Scientific Cluster (VSC) and are used here under the license Creative Commons Attribution Share Alike 4.0 International. The original materials can be accessed here.

 

All presentations and educational materials of this course are provided under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

    • Welcome and Introduction

      Overview of the workshop objectives.

    • Python Essentials

      Introduction
      Python versions and runtime environments
      Package management and environments
      Uv: A fast Python package manager built with Rust, designed for speed and efficiency in dependency resolution and installation.
      Ruff: A performance-focused linter and code formatter that optimizes Python code formatting and analysis, especially for large projects.

    • Coffee Break
    • Virtual Environments

      Anaconda: Managing environments and packages using Conda, creating and sharing reproducible setups.
      Python venv & virtualenv : Standard Python tools for creating lightweight virtual environments, their differences, and when to use each.

    • Running Python in Batch Mode

      Overview of HPC systems : Introduction to high-performance computing (HPC) clusters, their architecture, and job scheduling systems.
      Using Lmod to load packages : How to use Lmod for environment module management and dynamically loading/unloading software packages.
      Useful commands for the batch system
      Example Python batch scripts

    • Lunch Break
    • Benchmarking and Profiling

      Performance measurement techniques

    • NumPy

      Efficient numerical computations: Leveraging NumPy for high-performance array operations, vectorization, and avoiding performance pitfalls.

    • Q&A and Wrap-up
  • Wednesday, 30 April
    • Recap of Day 1

      Summary of key concepts covered on Day 1

    • Cython

      Introduction
      Combining NumPy and Cython: Overview of Cython and its role in accelerating Python code and optimize numerical computations with NumPy.

    • Dask

      Parallel computing with Dask: Introduction to Dask and its use for parallelizing NumPy, pandas, and custom workflows.

    • Coffee Break
    • Numba

      Just-in-time compilation for Python: Using Numba to compile Python functions for near-C speed

    • Lunch Break
    • SLURM and MPI

      SLURM: Job scheduling with srun and sbatch.
      MPI with mpi4py: Introduction to MPI and using mpi4py for parallel programming.

    • Containerization and Distributed Computing

      Introduction to containerization for HPC environments
      Apptainer: A containerization tool for reproducible environments.
      Ray: A framework for building and running distributed applications.

    • Coffee Break
    • Exercise

      Hands-on practice

    • Q&A and Closing Remarks