SC20 Is Everywhere We Are

SC20 Virtual Platform
High-Order Finite Element Method Using Standard and Device-Level Batch GEMM on GPUs
Event Type
Extreme Scale Computing
Performance/Productivity Measurement and Evaluation
Scalable Computing
Scientific Computing
Registration Categories
TimeThursday, 12 November 20204:10pm - 4:35pm EDT
LocationTrack 8
DescriptionWe present new GPU implementations of the tensor contractions arising from basis-related computations for high-order finite element methods. We consider both tensor and non-tensor bases. In the case of tensor bases, we introduce new kernels based on
a series of fused device-level matrix multiplications (GEMMs), specifically designed to
utilize the fast memory of the GPU. For non-tensor bases, we develop a tuned
framework for choosing standard batch-BLAS GEMMs that will maximize performance
across groups of elements. The implementations are included in a backend of the
libCEED library. We present benchmark results for the diffusion and
mass operators using libCEED integration through the MFEM finite element library
and compare to those of the previously best-performing GPU backends for
stand-alone basis computations. In tensor cases, we see improvements of up to 10-30%
for some cases, particularly for higher basis orders. For the non-tensor tests,
the new batch-GEMM implementation is twice as fast as what was previously available
for basis function order greater than five and greater than approximately 10^5 degrees
of freedom in the mesh; up to ten times speedup is seen for eighth-order basis functions.
Back To Top Button