BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160548Z
LOCATION:Track 3
DTSTART;TZID=America/New_York:20201112T153000
DTEND;TZID=America/New_York:20201112T160000
UID:submissions.supercomputing.org_SC20_sess209_ws_mlhpce105@linklings.com
SUMMARY:Accelerate Distributed Stochastic Gradient Descent for Nonconvex O
ptimization with Momentum
DESCRIPTION:Workshop\n\nAccelerate Distributed Stochastic Gradient Descent
for Nonconvex Optimization with Momentum\n\nCong, liu\n\nMomentum method
has been used extensively in optimizers for deep learning. Recent studies
show that distributed training through K-step averaging has many nice prop
erties. We propose a momentum method for such model averaging approaches.
At each individual learner level traditional stochastic gradient is applie
d. At the meta-level (global learner level), one momentum term is applied
and we call it block momentum. We analyze the convergence and scaling prop
erties of such momentum methods. Our experimental results show that block
momentum not only accelerates training, but also achieves better results.\
n\nRegistration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR