Speaker
Description
Simulating wave propagation with the Fourier collocation method
is computationally intensive due to its reliance on discrete Fourier
transforms (DFTs). While DFTs enable near-minimal spatial dis-
cretization, they scale poorly on modern high-performance com-
puting systems. This work evaluates two multi-GPU strategies for
three-dimensional simulations: a Global FFT approach using dis-
tributed transforms and a Local FFT approach based on domain
decomposition with halo exchanges. Experiments were performed
on system with 8 NVIDIA A100 GPUs connected via NVSwitch.
Precision tests show that the Local FFT approach maintains errors
around 0.1% when the halo covers the local PML region. Perfor-
mance results demonstrate that the Local FFT approach achieves
lower runtimes and significantly reduced communication overhead
compared to the Global FFT approach, particularly for larger do-
mains. These findings indicate that Local FFT decomposition is a
promising strategy for scalable, large-scale multi-node ultrasound
simulations.