BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160045Z
LOCATION:Track 4
DTSTART;TZID=America/New_York:20201117T103000
DTEND;TZID=America/New_York:20201117T110000
UID:submissions.supercomputing.org_SC20_sess175_pap302@linklings.com
SUMMARY:Sparse GPU Kernels for Deep Learning
DESCRIPTION:Paper\n\nSparse GPU Kernels for Deep Learning\n\nGale, Zaharia
 , Young, Elsen\n\nScientific workloads have traditionally exploited high l
 evels of sparsity to accelerate computation and reduce memory requirements
 . While deep neural networks can be made sparse, achieving practical speed
 ups on GPUs is difficult because these applications have relatively modera
 te levels of sparsity that are not sufficient for existing sparse kernels 
 to outperform their dense counterparts. In this work, we study sparse matr
 ices from deep learning applications and identify favorable properties tha
 t can be exploited to accelerate computation. Based on these insights, we 
 develop high-performance GPU kernels for two sparse matrix operations wide
 ly applicable in neural networks: sparse matrix-dense matrix multiplicatio
 n and sampled dense-dense matrix multiplication. Our kernels reach 27% of 
 single-precision peak on Nvidia V100 GPUs. Using our kernels, we demonstra
 te sparse Transformer and MobileNet models that achieve 1.2-2x speedups an
 d up to 12.8x memory savings without sacrificing accuracy.\n\nTag: Acceler
 ators, FPGA, and GPUs, Machine Learning, Deep Learning and Artificial Inte
 lligence, Sparse Computation\n\nRegistration Category: Tech Program Reg Pa
 ss
END:VEVENT
END:VCALENDAR

