BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160210Z
LOCATION:Track 7
DTSTART;TZID=America/New_York:20201119T110000
DTEND;TZID=America/New_York:20201119T113000
UID:submissions.supercomputing.org_SC20_sess292_drs106@linklings.com
SUMMARY:The Coming of Age of Multithreaded High-Performance Communication
DESCRIPTION:Doctoral Showcase\n\nThe Coming of Age of Multithreaded High-P
 erformance Communication\n\nZambre, Chandramowlishwaran\n\nThe supercomput
 ing community holds an outdated view: the network is a single device. Mode
 rn interconnects, however, feature multiple network hardware contexts that
  serve as parallel interfaces into the network from a single node. Additio
 nally, as we are approaching the limits of a single network link’s t
 hroughput, supercomputers are deploying multiple NICs per node to accommod
 ate for higher bandwidth per node. Hence, the modern reality is that the n
 etwork features lots of parallelism. The outdated view drastically hurts t
 he communication performance of the MPI+threads model, which is being incr
 easingly adopted over the traditional MPI-everywhere model to better map t
 o modern processors that feature a lesser share of resource per core than 
 previous processors. Domain scientists typically do not expose logical par
 allelism in their MPI+threads communication, and MPI libraries still use c
 onservative approaches, such as a global critical section, to maintain MPI
 ’s ordering constraints, thus serializing access to the parallel net
 work resources and limiting performance. The goal of this dissertation is 
 to dissolve the communication bottleneck in MPI+threads. Existing solution
 s either sacrifice correctness for performance or jump to MPI standard ext
 ensions without fairly comparing the capabilities of the existing standard
 . The holistic bottom-up analyses in this dissertation first investigates 
 the limits of multithreaded communication on modern network hardware, then
  devises a new MPI-3.1 implementation with virtual communication interface
 s (VCIs) for fast MPI+threads communication. The domain scientist can use 
 the VCIs either explicitly (MPI Endpoints) or implicitly (MPI-3.1). The di
 ssertation compares the two solutions through both performance and usabili
 ty lenses.\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

