The webinar will present a novel data management and storage platform for exascale computing based on hierarchical storage management (HSM) and ephemeral data life-cycle management. The aim of the platform is to allow efficient usage of storage tiers. Critical aspects of intelligent data placement are considered for extreme volumes of data. This ensures that the right resources among the storage tiers are used and accessed by data nodes as close as possible to compute nodes – optimizing performance, cost, and energy at an extreme scale. The methods and techniques which will be presented are applicable to exascale class data intensive applications and workflows that need to be deployed in highly heterogeneous computing environments. System requirements are driven by six data intensive use-cases, which will be introduced during the second half of the webinar.
Target Audience and Purpose of the Webinar
The webinar is suitable for the general public with the purpose to introduce the domain of HSM focused on efficient usage of storage tiers. The concept of ephemeral data nodes and data accessors that allows users to flexibly operate the system, using various well-known data access paradigms, such as POSIX namespaces, S3/Swift Interfaces, MPI-IO, and other middleware, data formats, and protocols will be introduced. The main concept will be presented in the six data intensive use cases to better understand the whole approach.
A personal computer suitable for online education.
Agenda and Content of the Webinar
1. Philippe Couvee: IO-SEA introduction, concepts, and challenges
The IO-SEA project’s objective is to build a novel approach for the IO stack of the exascale class supercomputers, based upon ephemeral I/O services tailored and launched on purpose for workflows, running on dedicated “data nodes” and providing access to the users’ datasets stored in an object store. In this talk, we will expose the main concepts and discuss the challenges of this architecture.
2. James Hawkes: Addressing data storage constraints in numerical weather prediction
NWP workflows present a double challenge in terms of I/O: the volume of data produced is considerable, and its associated value decreases sharply with time, putting harsh constraints on storage capacity and throughput. I will present the challenges and lessons learned at ECMWF and discuss how IO-SEA can improve data handling and data management.
3. Damien Chapon: RAMSES: modeling astrophysical systems
Self-gravitating astrophysical plasma flows are modeled in the RAMSES code on a distributed adaptive mesh for various scientific applications. To face challenging I/O bottlenecks, it already integrates the HERCULE I/O library and we will further test hierarchical storage infrastructure in the IO-SEA context.
4. Jiri Novacek: Remote on-the-fly analysis and cloudification of the cryo-electron microscopy data
Electron cryo-microscopy (cryo-EM) is used to determine near-atomic structures of larger proteins or protein: nuclei acid complexes under near-native conditions. Herein, I will describe a framework that was developed to perform real-time data analysis on the remote HPC resources and which also facilitates data sharing and publication under the FAIR principles.
5. Eric Gregory: LQCD: Particle physics from a supercomputer
Lattice quantum-chromodynamics is a framework to calculate the properties of particles built of quarks and gluons by simulating the interaction of quantum fields in a discretized box of space-time. I will discuss the computing and I/O challenges facing the research field.
6. Ghazal Tashakor: IO – Software for Terrestrial Systems Modelling Platform (TSMP)
The Terrestrial Systems Modelling Platform (TSMP) is a fully coupled, scale consistent, highly modular, and massively parallel regional Earth System Model. TSMP is selected as one of the IO-SEA use cases to get optimized for IO-SEA storage infrastructure and to test the IO-SEA technical solutions.
About the tutors
Philippe Couvee, Atos
Philippe Couvée has been working for Atos HPC R&D (Grenoble, France) for more than 20 years. He is leading a team of 17 researchers and engineers developing products to facilitate data access from large supercomputers. His recent focus is on data-centric solutions that combine cache and acceleration techniques with advanced instrumentation and data analytics. He is also teaching computer architecture and system programming at CNAM school.
Damien Chapon, CEA
Dr. Damien CHAPON is an engineer-researcher since 2014 at CEA Saclay, France. He received a Ph.D. in astronomy and astrophysics in 2011 from the Paris-Diderot University, France. His current research activities focus on parallel scientific data analysis and visualization tool development, adaptive mesh refinement data model, and data compression. He is also the core developer of the Galactica simulation database for astrophysicists, a major Open Science initiative.
Eric Gregory, FZJ
Dr. Eric B. Gregory is on the scientific staff at the Juelich Supercomputer Centre, in Juelich, Germany. He received a Ph.D. in theoretical particle physics from Syracuse University, USA. His current research focus involves performing large-scale computer simulations to understand how the interaction of quarks and gluons gives rise to the observed properties of hadrons, such as the proton.
James Hawkes, ECMWF
Dr. James Hawkes is a computational scientist at the European Centre for Medium-range Weather Forecasts and team leader of the Data Processing Services team. He received a Ph.D. in engineering, specialising in massively asynchronous linear solvers, from the University of Southampton in 2017. His current activities revolve around the storage and processing of large-scale, time critical, meteorological data, in particular in terms of performance and scalability. He leads work package 5 in IOSEA, driving the development of the Data Access and Storage Interface (DASI), a semantic data management engine for IO-SEA.
Jiri Novacek, CEITEC
Jiri Novacek is the head of the Cryo-electron microscopy and tomography core facility at CEITEC Masaryk University (Brno, Czech Republic). He received Ph.D. in Biomolecular Chemistry in 2013. His current activities revolve around the automation of cryo-electron microscopy data acquisition and analysis. He also focuses on the implementation of new workflows which involve cryo-EM.
Ghazal Tashakor, FZJ
Ghazal Tashakor became a member of the Division of Computational Science in FZJ in Jülich in 2021 as a Postdoctoral researcher. She received her Ph.D. in HPC and advanced simulation from the Autonomous University of Barcelona in 2019. Her research focus during the past years was the interdisciplinary experience in the academic and industrial investigation of distributed and parallel architecture patterns from Grid to data visualization/monitoring upon Big data and advanced hierarchal models.
This work was supported by the IO-SEA project. This project has received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955811. The JU receives support from the European Union’s Horizon 2020 research and innovation programme and France, Germany, the United Kingdom, Ireland, the Czech Republic, Sweden. This project has received funding from the Ministry of Education, Youth and Sports of the Czech Republic (ID: MC2105).