BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160103Z
LOCATION:Track 3
DTSTART;TZID=America/New_York:20201119T100000
DTEND;TZID=America/New_York:20201119T103000
UID:submissions.supercomputing.org_SC20_sess156_pap582@linklings.com
SUMMARY:Pencil: A Pipelined Algorithm for Distributed Stencils
DESCRIPTION:Paper\n\nPencil: A Pipelined Algorithm for Distributed Stencil
 s\n\nWang, Chandramowlishwaran\n\nStencil computations are at the core of 
 Computational Fluid Dynamics (CFD). Given its memory-bound nature, numerou
 s temporal tiling algorithms have been proposed to improve its performance
 . Although efficient, most algorithms aim at a single iteration space on s
 hared-memory machines. In CFD, however, we are confronted with multiple co
 nnected iteration spaces distributed across many nodes.\n\nWe propose a pi
 pelined stencil algorithm called Pencil for multiple iteration spaces in d
 istributed computing. We identify the optimal combination of MPI and OpenM
 P for temporal tiling based on an in-depth analysis of single node perform
 ance and exploit deep halo to decouple connected iteration spaces. Moreove
 r, Pencil pipelines the computation and communication to achieve overlap. 
 Evaluated on 4 different stencils across 6 numerical schemes, our algorith
 m demonstrates up to 1.9x speedup over Pluto on a single node and 1.3-3.41
 x speedup compared to an MPI+OpenMP Funneled implementation with space til
 ing for a multi-block grid on 32 nodes.\n\nTag: Algorithms, Graph Algorith
 ms, Linear Algebra\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

