BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160057Z
LOCATION:Track 3
DTSTART;TZID=America/New_York:20201118T110000
DTEND;TZID=America/New_York:20201118T113000
UID:submissions.supercomputing.org_SC20_sess180_pap486@linklings.com
SUMMARY:GEMS: GPU-Enabled Memory-Aware Model-Parallelism System for Distri
 buted DNN Training
DESCRIPTION:Paper\n\nGEMS: GPU-Enabled Memory-Aware Model-Parallelism Syst
 em for Distributed DNN Training\n\nJain, Awan, Aljuhani, Hashmi, Anthony..
 .\n\nData-parallelism has become an established paradigm in which to train
  DNNs that fit the GPU memory on large-scale HPC systems. Model-parallelis
 m, however, is required to train out-of-core DNNs. In this paper, we deal 
 with emerging requirements brought forward by very-large DNNs being traine
 d using high-resolution images common in digital pathology. To address the
 se, we propose, design and implement GEMS, a GPU-Enabled Memory-Aware Mode
 l-Parallelism System. We present several design schemes like GEMS-MAST, GE
 MS-MASTER and GEMS-Hybrid that offer excellent speedups over state-of-the-
 art systems like Mesh-TensorFlow and FlexFlow. Furthermore, we combine mod
 el-parallelism and data-parallelism to train a 1000-layer ResNet-1k model 
 using 1024 Volta V100 GPUs with 97.32% scaling-efficiency. For the real-wo
 rld histopathology whole-slide-image (WSI) of 100,000 x 100,000 pixels, we
  train custom ResNet-110-v2 on image tiles of size 1024 x 1024 and reduce 
 the training time from seven hours to 28 minutes.\n\nTag: Accelerators, FP
 GA, and GPUs, Machine Learning, Deep Learning and Artificial Intelligence,
  Scalable Computing\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

