Abstract
With the increasing prevalence of multicore processors, shared-memory programming models are essential. OpenMP is a popular, portable, widely supported and easy-to-use shared-memory model. Developers usually find OpenMP easy to learn. However, they are often disappointed with the performance and scalability of the resulting code. This disappointment stems not from shortcomings of OpenMP but rather with the lack of depth with which it is employed. Our “Advanced OpenMP Programming” tutorial addresses this critical need by exploring the implications of possible OpenMP parallelization strategies, both in terms of correctness and performance. We assume attendees understand basic parallelization concepts and know the fundamentals of OpenMP. We focus on performance aspects, such as data and thread locality on NUMA architectures, false sharing, and exploitation of vector units, and present the directives for attached compute accelerators. All topics are accompanied with extensive case studies and we discuss the corresponding language features in-depth.
Purpose of the course (benefits for the attendees)
The course will improve users' understanding of OpenMP and its new features, to make the best use of this standard on modern computer architectures.
About the tutor(s)
Christian Terboven is a senior scientist and leads the HPC group at RWTH Aachen University. His research interests center around Parallel Programming and related Software Engineering aspects. Dr. Terboven has been involved in the Analysis, Tuning and Parallelization of several large-scale simulation codes for various architectures. He is responsible for several research projects in the area of programming models and approaches to improve the productivity and efficiency of modern HPC systems. He is a co-author of a book Using OpenMP – The Next Step.
Tim Cramer is post-doctoral researcher in the HPC group at RWTH Aachen University, Germany. He earned his doctoral degree in Computer Science from RWTH Aachen University in 2017. His research focus is the correctness and performance analysis for parallel programs for homogeneous and heterogeneous architectures. As a member of the High-Performance Computing group Dr. Cramer is involved in support for users of the RWTH Compute Cluster and the development of compiler and runtime support for OpenMP Target Offloading.
Acknowledgements
This work was supported by The Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project „IT4Innovations National Supercomputing Center – LM2015070“