BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160555Z
LOCATION:Track 8
DTSTART;TZID=America/New_York:20201112T161000
DTEND;TZID=America/New_York:20201112T163500
UID:submissions.supercomputing.org_SC20_sess214_ws_lasalss105@linklings.co
m
SUMMARY:High-Order Finite Element Method Using Standard and Device-Level B
atch GEMM on GPUs
DESCRIPTION:Workshop\n\nHigh-Order Finite Element Method Using Standard an
d Device-Level Batch GEMM on GPUs\n\nBeams, Abdelfattah, Tomov, Dongarra,
Kolev...\n\nWe present new GPU implementations of the tensor contractions
arising from basis-related computations for high-order finite element meth
ods. We consider both tensor and non-tensor bases. In the case of tensor
bases, we introduce new kernels based on\na series of fused device-level m
atrix multiplications (GEMMs), specifically designed to\nutilize the fast
memory of the GPU. For non-tensor bases, we develop a tuned\nframework fo
r choosing standard batch-BLAS GEMMs that will maximize performance\nacros
s groups of elements. The implementations are included in a backend of th
e\nlibCEED library. We present benchmark results for the diffusion and\nm
ass operators using libCEED integration through the MFEM finite element li
brary\nand compare to those of the previously best-performing GPU backends
for\nstand-alone basis computations. In tensor cases, we see improvements
of up to 10-30%\nfor some cases, particularly for higher basis orders. F
or the non-tensor tests,\nthe new batch-GEMM implementation is twice as fa
st as what was previously available\nfor basis function order greater than
five and greater than approximately 10^5 degrees\nof freedom in the mesh;
up to ten times speedup is seen for eighth-order basis functions.\n\nTag:
Algorithms, Extreme Scale Computing, Performance/Productivity Measurement
and Evaluation, Scalable Computing, Scientific Computing\n\nRegistration
Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR