BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160556Z
LOCATION:Track 7
DTSTART;TZID=America/New_York:20201111T113000
DTEND;TZID=America/New_York:20201111T120000
UID:submissions.supercomputing.org_SC20_sess199_ws_dls101@linklings.com
SUMMARY:Online-Codistillation Meets LARS: Going beyond the Limit of Data P
 arallelism in Deep Learning
DESCRIPTION:Workshop\n\nOnline-Codistillation Meets LARS: Going beyond the
  Limit of Data Parallelism in Deep Learning\n\nMurai, Mikami, Koyama, Suzu
 ki, Akiba\n\nData parallel training is a powerful family of methods for th
 e efficient training of deep neural networks on big data. Unfortunately, h
 owever, recent studies have shown that the merit of increased batch-size i
 n terms of both speed and model-performance diminishes rapidly beyond some
  point.  This seems to apply even to LARS, the state-of-the-art large batc
 h stochastic optimization method.\n\nIn this paper, we combine LARS with o
 nline-codistillation, a recently developed, efficient deep learning algori
 thm built on a whole different philosophy of stabilizing the training proc
 edure using a collaborative ensemble of models. We show that the combinati
 on of large-batch training and online-codistillation is much more efficien
 t than either one alone. We also present a novel way of implementing the o
 nline-codistillation that can further speed up the computation. We will de
 monstrate the efficacy of our approach on various benchmark datasets.\n\nR
 egistration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR

