BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160058Z
LOCATION:Track 4
DTSTART;TZID=America/New_York:20201118T130000
DTEND;TZID=America/New_York:20201118T133000
UID:submissions.supercomputing.org_SC20_sess154_pap555@linklings.com
SUMMARY:Fast Stencil-Code Computation on a Wafer-Scale Processor
DESCRIPTION:Paper\n\nFast Stencil-Code Computation on a Wafer-Scale Proces
 sor\n\nRocki, Van Essendelft, Sharapov, Schreiber, Morrison...\n\nThe perf
 ormance of CPU-based and GPU-based systems is often low for PDE codes, whe
 re large, sparse and often structured systems of linear equations must be 
 solved. Iterative solvers are limited by data movement, both between cache
 s and memory and among nodes. Here we describe the solution of such system
 s of equations on the Cerebras Systems CS-1, a wafer-scale processor that 
 has the memory bandwidth and communication latency to perform well. We ach
 ieve 0.86 PFLOPS on a single wafer-scale system for the solution by BiCGSt
 ab of a linear system arising from a 7-point finite difference stencil on 
 a 600 × 595 × 1536 mesh, achieving about one third of the machine’s peak p
 erformance. We explain the system, its architecture and programming and it
 s performance on this problem and related problems. We discuss issues of m
 emory capacity and floating point precision. We outline plans to extend th
 is work toward full applications.\n\nTag: Accelerators, FPGA, and GPUs, Ap
 plications, Architectures\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR