BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160105Z
LOCATION:Track 4
DTSTART;TZID=America/New_York:20201119T150000
DTEND;TZID=America/New_York:20201119T153000
UID:submissions.supercomputing.org_SC20_sess170_pap391@linklings.com
SUMMARY:Compiling Generalized Histograms for GPU
DESCRIPTION:Paper\n\nCompiling Generalized Histograms for GPU\n\nHenriksen
 , Hellfritzsch, Sadayappan, Oancea\n\nWe present and evaluate an implement
 ation technique for histogram-like computations on GPUs that ensures both 
 work-efficient asymptotic cost, support for arbitrary associative and comm
 utative operators and efficient use of hardware-supported atomic operation
 s, when applicable.  Based on a systematic empirical examination of the de
 sign space, we develop a technique that balances conflict rates and memory
  footprint.\n\nWe demonstrate our technique both as a library implementati
 on in CUDA, as well as by extending the parallel array language Futhark wi
 th a new construct for expressing generalized histograms, and by supportin
 g this construct with several compiler optimizations.  We show that our hi
 stogram implementation taken in isolation outperforms similar primitives f
 rom CUB, and that it is competitive or outperforms the hand-written code o
 f several application benchmarks, even when the latter is specialized for 
 a class of datasets.\n\nTag: Accelerators, FPGA, and GPUs, Compilers Analy
 sis and Optimization\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

