Introducing Multi-Level Parallelism, at Coarse, Fine, and Instruction Level to Enhance the Performance of Iterative Solvers for Large Sparse Linear Systems on Multi- and Many-Core Architecture
Event Type
Workshop
Extreme Scale Computing
Heterogeneous Systems
Parallel Programming Languages, Libraries, and Models
Portability
Resource Management and Scheduling
Scalable Computing
W
TimeWednesday, 11 November 202011:40am - 12:05pm EDT
LocationTrack 1
DescriptionMulti-core and many-core systems are now a common feature of new hardware architectures. The introduction of a very large number of cores at the processor level requires multi-level parallelism to fully take advantage of the offered computing power. The induced programming effort can be fixed with parallel programming models based on the data flow model and the task programming paradigm. Standard numerical algorithms must then be revisited to be parallelized at the finest levels. Iterative linear solvers are a key part of petroleum reservoir simulation representing up to 80% of the total computing time. Standard preconditioning methods for large, sparse matrices -- such as Incomplete LU Factorization (ILU) or Algebraic Multigrid (AMG) -- fail to scale on architectures with a large number of cores.
We reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, at fine level with threads, and at the instruction level to enable SIMD optimizations. We enhance the implementation of preconditioners like the multi-level domain decomposition~(DDML) preconditioners, based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by KNL or SkyLake processors.
We reconsider preconditioning algorithms to better introduce multi-level parallelism at both coarse level with MPI, at fine level with threads, and at the instruction level to enable SIMD optimizations. We enhance the implementation of preconditioners like the multi-level domain decomposition~(DDML) preconditioners, based on the popular Additive Schwartz Method (ASM), or the classical ILU0 preconditioner with the fine grained parallel fixed point variant. Our approach is validated on linear systems extracted from realistic petroleum reservoir simulations. The robustness of the preconditioners is tested with respect to the data heterogeneities of the study cases. We evaluate the extensibility of our implementation regarding the model sizes and its scalability regarding the large number of cores provided by KNL or SkyLake processors.