BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160557Z
LOCATION:Track 7
DTSTART;TZID=America/New_York:20201111T143000
DTEND;TZID=America/New_York:20201111T150000
UID:submissions.supercomputing.org_SC20_sess199_ws_dls102@linklings.com
SUMMARY:Time-Based Roofline for Deep Learning Performance Analysis
DESCRIPTION:Workshop\n\nTime-Based Roofline for Deep Learning Performance 
 Analysis\n\nWang, Yang, Farrell, Zhang, Kurth...\n\nDeep learning applicat
 ions based on neural networks are generating considerable interest in vari
 ous fields due to their high accuracy. Such an application is usually very
  compute-intensive and thus requires a long run time. Researchers and engi
 neers are actively exploring new solutions to this issue from both hardwar
 e and software/algorithm sides. Little previous work, however, has focused
  on providing a practical methodology to characterize deep learning perfor
 mance bottlenecks and potentially guide the following optimization efforts
 . In this paper, we introduce an extension of the Roofline model and use i
 t to analyze two representative computation kernels in deep learning, 2D c
 onvolution and long short-term memory, on NVIDIA GPUs. This new time-based
  Roofline model incorporates both compute/bandwidth complexity and run tim
 e in its formulae to demonstrate performance issues that cannot be reflect
 ed by the classic Roofline. Factors such as arithmetic intensity, data tra
 nsfer, kernel launch overhead and tensor core usage will be examined by va
 rying different parameters such as batch size and feature size, etc. This 
 work helped form a more systematic way to understand the performance issue
  of deep learning applications. Last but not least, this generic performan
 ce model can be applied to a wide category of applications besides deep le
 arning.\n\nRegistration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR

