InfiniBand (IB), Omni-Path, and High-speed Ethernet (HSE) technologies are generating a lot of excitement toward building next generation High-End Computing (HEC) systems including clusters, datacenters, file systems, storage, cloud computing and Big Data (Hadoop, Spark, HBase, and Memcached) environments. RDMA over Converged Enhanced Ethernet (RoCE) technology is also emerging. This tutorial will provide an overview of these emerging technologies, their offered architectural features, their current market standing, and their suitability for designing HEC systems.
It will start with a brief overview of IB, Omni-Path, and HSE. An in-depth overview of the architectural features of IB, Omni-Path, and HSE (including iWARP and RoCE), their similarities and differences, and the associated protocols will be presented. Next, an overview of the OpenFabrics stack which encapsulates IB, HSE, and RoCE (v1/v2) in a unified manner will be presented. An overview of the Libfabrics stack will also be provided. Hardware/software solutions and the market trends behind IB, Omni-Path, HSE, and RoCE will be highlighted. Finally, sample performance numbers of these technologies and protocols for different environments will be presented.
Remark: The course will be followed by an advanced treatment of those topics in the course InfiniBand, Omni-Path, and High-Speed Ethernet: Advanced Features, Challenges in Designing, HEC Systems and Usage (seperate registration needed).
Purpose of the course (benefits for the attendees)
The goals and benefits of this tutorial are as follows:
- Making the attendees familiar with the IB, Omni-Path, and HSE architectures and the associated benefits.
- Demonstrating how the OpenFabrics stack is trying to provide a convergence between these two standards.
- Providing an overview of available IB, Omni-Path, and HSE hardware/software solutions.
- Illustrating sample performance numbers showing trends in various environments and how they are taking advantage of IB, Omni-Path, and HSE features.
In summary, the tutorial aims to make the attendees familiar with IB, Omni-Path, and HSE, their benefits, available hardware/software solutions with these standards, the latest trends in designing high-end computing, networking, storage, cloud computing and big data systems with these standards, and providing a critical assessment of whether different IB, Omni-Path, and HSE products are ready for prime-time or not.
About the tutor(s)
Dhabaleswar K. (DK) Panda is a Professor and University Distinguished Scholar of Computer Science at the Ohio State University. He obtained his Ph.D. in computer engineering from the University of Southern California. His research interests include parallel computer architecture, high-performance computing, communication protocols, file systems, network-based computing, Big Data, and Deep Learning. He has published over 400 papers in major journals and international conferences related to these research areas.
Dr. Panda and his research group members have been doing extensive research on modern networking technologies including InfiniBand, Omni-Path, HSE and RDMA over Converged Enhanced Ethernet (RoCE). His research group is currently collaborating with National Laboratories and leading InfiniBand, Omni-Path, and Ethernet/iWARP companies on designing various subsystems of next generation high-end systems. The MVAPICH2 (High Performance MPI over InfiniBand, Omni-Path, iWARP, and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,800 organizations worldwide (in 85 countries). These libraries are available from http://mvapich.cse.ohio-state.edu. This software has enabled several InfiniBand clusters (including the 1st one) to get into the latest TOP500 ranking. These software packages are also available with the Open Fabrics stack for network vendors (InfiniBand, Omni-Path and iWARP), server vendors and Linux distributors. The RDMA-enabled Apache Hadoop, Spark and Memcached packages, consisting of acceleration for HDFS, MapReduce, RPC and Memcached and support for clusters with Lustre file systems, are publicly available from http://hibd.cse.ohio-state.edu. These libraries are being used by more than 245 organizations in 31 countries.
The group has also been focusing on co-designing Deep Learning Frameworks and MPI Libraries. A high-performance and scalable version of the Caffe framework is available from the High-Performance Deep Learning (HiDL) Project site (http://hidl.cse.ohio-state.edu). Dr. Panda's research is supported by funding from US National Science Foundation, US Department of Energy, US Department of Defense, and several industry sponsors including Intel, Cisco, SUN, Mellanox, QLogic, Microsoft, NVIDIA and NetApp. He is an IEEE Fellow and a member of ACM.
More details about Dr. Panda, including a comprehensive CV and publications are available at: http://web.cse.ohio-state.edu/~panda.2/
Dr. Hari Subramoni is a research scientist in the Department of Computer Science and Engineering at the Ohio State University, USA, since September 2015. His current research interests include high performance interconnects and protocols, parallel computer architecture, network-based computing, exascale computing, network topology aware computing, QoS, power-aware LAN-WAN communication, fault tolerance, virtualization, big data and cloud computing. He has published over 50 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. Dr. Subramoni is doing research on the design and development of MVAPICH2 (High-Performance MPI over InfiniBand, iWARP, and RoCE) and MVAPICH2-X (Hybrid MPI and PGAS (OpenSHMEM, UPC, and CAF)) software packages. He is a member of IEEE.
More details about Dr. Subramoni are available at: http://www.cse.ohio-state.edu/~subramon