BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160554Z
LOCATION:Track 4
DTSTART;TZID=America/New_York:20201111T152000
DTEND;TZID=America/New_York:20201111T154000
UID:submissions.supercomputing.org_SC20_sess201_ws_ia106@linklings.com
SUMMARY:DistDGL: Distributed Graph Neural Network Training for Billion-Sca
 le Graphs
DESCRIPTION:Workshop\n\nDistDGL: Distributed Graph Neural Network Training
  for Billion-Scale Graphs\n\nZheng, Ma, Wang, Zhou, Su...\n\nGraph neural 
 networks (GNN) have shown great success in learning from graph-structured 
 data.  They are widely used in various applications, such as recommendatio
 n, fraud detection, and search. In these domains, the graphs are typically
  large, containing hundreds of millions of nodes and several billions of e
 dges. To tackle this challenge, we develop DistDGL, a system for training 
 GNNs in a mini-batch fashion on a cluster of machines. DistDGL is based on
  the Deep Graph Library (DGL), a popular GNN development framework.\n\nDis
 tDGL distributes the graph and its associated data (initial features and e
 mbeddings) across the machines and uses this distribution to derive a comp
 utational decomposition by following an owner-compute rule. DistDGL follow
 s a synchronous training approach and allows ego-networks forming the mini
 -batches to include non-local nodes. To minimize the overheads associated 
 with distributed computations, DistDGL uses a high-quality and light-weigh
 t min-cut graph partitioning algorithm along with multiple balancing const
 raints. This allows it to reduce communication overheads and statically ba
 lance the computations. It further reduces the communication by replicatin
 g halo nodes and by using sparse embedding updates.  The combination of th
 ese design choices allows DistDGL to train high-quality models while achie
 ving high parallel efficiency and memory scalability.  We demonstrate our 
 optimizations on both inductive and transductive GNN models.  Our results 
 show that DistDGL achieves linear speedup and requires only 25 seconds to 
 complete a training epoch for a graph with 100 million nodes and 3 billion
  edges on a cluster with eight machines.\n\nRegistration Category: Worksho
 p Reg Pass
END:VEVENT
END:VCALENDAR