Revisiting Exponential Integrator Methods for HPC with a Mini-Application
Event Type
Workshop
Algorithms
Extreme Scale Computing
Performance/Productivity Measurement and Evaluation
Scalable Computing
Scientific Computing
W
TimeThursday, 12 November 20201:05pm - 1:30pm EDT
LocationTrack 8
DescriptionIn this work we look at employing communication-avoiding techniques commonly used in Krylov methods in the context of exponential integrators for the solution of stiff partial differential equations. We choose an exponential integrator method based on polynomial approximations, as compared to those based on Krylov methods, to improve the possible strong scaling by reducing the possible all-to-all communications prevalent in iterative Krylov solvers. We implement this within the published TeaLeaf mini-app which is parallelized with MPI+OpenMP, and has an MPI+CUDA implementation.
We assess the scalability of our implementations on AWE's Damson Bull Sequana X1000 system up to 1024 nodes (36,864 cores), AWE's Bullace system which has nodes with attached Nvidia V100 GPUs and on the EPCC's Fulhame HPE Apollo 70. We find that our port of TeaLeaf using an exponential Euler method (CPEXI) scales well, in particular when configured in hybrid (MPI+OpenMP) where it achieves a parallel efficiency of 0.57 on 36,864 cores. This is better than comparative communication-avoiding Krylov iterative solvers, the best of which (CPPCG) achieves 0.36 at the same core count. Our GPU experiments show an encouraging speedup and scaling, with minimal overhead.
We assess the scalability of our implementations on AWE's Damson Bull Sequana X1000 system up to 1024 nodes (36,864 cores), AWE's Bullace system which has nodes with attached Nvidia V100 GPUs and on the EPCC's Fulhame HPE Apollo 70. We find that our port of TeaLeaf using an exponential Euler method (CPEXI) scales well, in particular when configured in hybrid (MPI+OpenMP) where it achieves a parallel efficiency of 0.57 on 36,864 cores. This is better than comparative communication-avoiding Krylov iterative solvers, the best of which (CPPCG) achieves 0.36 at the same core count. Our GPU experiments show an encouraging speedup and scaling, with minimal overhead.