The webinar will present a novel data management and storage platform for exascale computing based on hierarchical storage management (HSM) and ephemeral data life-cycle management. The aim of the platform is to allow efficient usage of storage tiers. Critical aspects of intelligent data placement are considered for extreme volumes of data. This ensures that the right resources among the storage tiers are used and accessed by data nodes as close as possible to compute nodes – optimising performance, cost, and energy at extreme scale. The methods and techniques which will be presented are applicable to exascale class data intensive applications and workflows that need to be deployed in highly heterogeneous computing environments. System requirements are driven by six data intensive use-cases, which will be introduced during the second half of the webinar.
Target Audience and Purpose of the Webinar
The webinar is suitable for general public with the purpose to introduce the domain of HSM focused on efficient usage of storage tiers. The concept of ephemeral data nodes and data accessors that allows users to flexibly operate the system, using various well-known data access paradigms, such as POSIX namespaces, S3/Swift Interfaces, MPI-IO and other middleware, data formats and protocols will be introduced. The main concept will be presented on the six data intensive use cases to better understand the whole approach.
A personal computer suitable for online education.
Agenda and Content of the Webinar
1. Philippe Couvee: IO-SEA introduction, concepts and challenges
The IO-SEA project’s objective is to build a novel approach for the IO stack of the exascale class supercomputers, based upon ephemeral I/O services tailored and launched on purpose for workflows, running on dedicated “data nodes” and providing access to the users’ datasets stored in an object store. In this talk, we will expose the main concepts and discuss the challenges of this architecture.
2. Olivier Iffrig: Addressing data storage constraints in numerical weather prediction
NWP workflows present a double challenge in terms of I/O: the volume of data produced is considerable, and its associated value decreases sharply with time, putting harsh constraints on storage capacity and throughput. I will present the challenges and lessons learnt at ECMWF and discuss how IO-SEA can improve data handling and data management.
3. Damien Chapon: RAMSES: modelling astrophysical systems
Self-gravitating astrophysical plasma flows are modelled in the RAMSES code on a distributed adaptive mesh for various scientific applications. To face challenging I/O bottlenecks, it already integrates the HERCULE I/O library and we will further test hierarchical storage infrastructure in the IO-SEA context.
4. Jiri Novacek: Remote on-the-fly analysis and cloudification of the cryo-electron microscopy data
Electron cryo-microscopy (cryo-EM) is used to determine near-atomic structures of larger proteins or proein:nuclei acid complexes under the near-native conditions. Herein, I will describe a framework which was developed to perform real-time data analysis on the remote HPC resources and which also facilitates data sharing and publication under the FAIR principles.
5. Eric Gregory: LQCD: Particle physics from a supercomputer
Lattice quantum-chromodynamics is a framework to calculate properties of particles build of quarks and gluons by simulating the interaction of quantum fields in a discretized box of space-time. I will discuss the computing and I/O challenges facing the research field.
6. Ghazal Tashakor: IO – Software for Terrestrial Systems Modelling Platform (TSMP)
The Terrestrial Systems Modelling Platform (TSMP) is a fully coupled, scale consistent, highly modular, and massively parallel regional Earth System Model. TSMP is selected as one of the IO-SEA use cases to get optimized for IO-SEA storage infrastructure and to test the IO-SEA technical solutions.
About the tutors
Philippe Couvee, Atos
Philippe Couvée has been working for the Atos HPC R&D (Grenoble, France) for more than 20 years. He is leading a team of 17 researchers and engineers developing products to facilitate data access from large supercomputers. His recent focus is on data centric solutions that combine cache and acceleration techniques with advanced instrumentation and data analytics. He is also teaching computer architecture and system programming at CNAM school.
Damien Chapon, CEA
Dr. Damien CHAPON is an engineer-researcher since 2014 at CEA Saclay, France. He received a PhD in astronomy and astrophysics in 2011 from the Paris-Diderot University, France. His current research activities focus on parallel scientific data analysis and visualisation tool development, adaptive mesh refinement data model, data compression. He is also the core developer of the Galactica simulation database for astrophysicists, a major Open Science initiative.
Eric Gregory, FZJ
Dr. Eric B. Gregory is on the scientific staff at the Juelich Supercomputer Centre, in Juelich, Germany. He received a PhD in theoretical partical physics from Syracuse University, USA. His current research focus involves performing large-scale computer simulations to understand how the interaction of quarks and gluons gives rise to the observed properties of hadrons, such as the proton.
Olivier Iffrig, ECMWF
Dr Olivier Iffrig is a computational scientist at the European Centre for Medium-range Weather Forecasts. He received a PhD in computational astrophysics from the Paris-Saclay University in 2016. His current activities revolve around the storage and processing of meteorological data, in particular in terms of performance and scalability. He is also a core developer of the Data Access and Storage Interface (DASI), a semantic data management engine for IO-SEA.
Jiri Novacek, CEITEC
Jiri Novacek is a head of the Cryo-electron microscopy and tomography core facility at CEITEC Masaryk University (Brno, Czech Republic). He received PhD in Biomolecular Chemistry in 2013. His current activities revolve around automation of the cryo-electron microscopy data acquisition and analysis. He also focuses on the implementation of new workflows which involve cryo-EM.
Ghazal Tashakor, FZJ
Ghazal Tashakor became a member of the Division of Computational Science in FZJ in Jülich in 2021 as a Postdoctoral researcher. She received her Ph.D. in HPC and advanced simulation from the Autonomous University of Barcelona in 2019. Her research focus during the past years was the interdisciplinary experience in the academic and industrial investigation of distributed and parallel architecture patterns from Grid to data visualization/monitoring upon Big data and advanced hierarchal models.
This work was supported by the IO-SEA project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955811. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and France, Germany, the United Kingdom, Ireland, the Czech Republic, Sweden. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic (ID: MC2105).