Speaker
Description
We summarize our progress towards an efficient multi-GPU solver for single-phase fluid flow based on the lattice Boltzmann method (LBM). It is known that LBM is suitable for massive parallelization on GPUs and the implementation for a single GPU in the CUDA framework is straightforward. The single-GPU implementation can be extended with a standard domain decomposition approach in order to utilize multiple GPUs. A domain decomposition solver can be implemented either as a single-process solver using a threading library such as OpenMP, or as a multi-process solver using message passing. Recently, we have extended our former single-process LBM solver into a multi-process solver using the MPI library in order to utilize many GPUs spread across multiple nodes on HPC clusters. We describe the optimization strategies and techniques used in the solver, analyze the efficiency and show the results of preliminary high-resolution simulations for problems related to our research projects.