Track 8
20201112T161000
20201112T163500
UID:submissions.supercomputing.org_SC20_sess214_ws_lasalss105@linklings.co
m
High-Order Finite Element Method Using Standard and Device-Level Batch GEMM on GPUs
atch GEMM on GPUs
DESCRIPTION:Workshop\n\nHigh-Order Finite Element Method Using Standard an
d Device-Level Batch GEMM on GPUs\n\nBeams, Abdelfattah, Tomov, Dongarra,
Kolev...\n\nWe present new GPU implementations of the tensor contractions
arising from basis-related computations for high-order finite element meth
ods. We consider both tensor and non-tensor bases. In the case of tensor
bases, we introduce new kernels based on\na series of fused device-level m
atrix multiplications (GEMMs), specifically designed to\nutilize the fast
memory of the GPU. For non-tensor bases, we develop a tuned\nframework fo
r choosing standard batch-BLAS GEMMs that will maximize performance\nacros
s groups of elements. The implementations are included in a backend of th
e\nlibCEED library. We present benchmark results for the diffusion and\nm
ass operators using libCEED integration through the MFEM finite element li
brary\nand compare to those of the previously best-performing GPU backends
for\nstand-alone basis computations. In tensor cases, we see improvements
of up to 10-30%\nfor some cases, particularly for higher basis orders. F
or the non-tensor tests,\nthe new batch-GEMM implementation is twice as fa
st as what was previously available\nfor basis function order greater than
five and greater than approximately 10^5 degrees\nof freedom in the mesh;
up to ten times speedup is seen for eighth-order basis functions.\n\nTag:
Algorithms, Extreme Scale Computing, Performance/Productivity Measurement
and Evaluation, Scalable Computing, Scientific Computing\n\nRegistration
Category: Workshop Reg Pass
