BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160553Z
LOCATION:Track 11
DTSTART;TZID=America/New_York:20201111T100000
DTEND;TZID=America/New_York:20201111T133000
UID:submissions.supercomputing.org_SC20_sess204@linklings.com
SUMMARY:FTXS: Workshop on Fault-Tolerance for HPC at Extreme Scale
DESCRIPTION:Workshop\n\nFTXS – Introduction: Workshop on Fault-Tolerance f
 or HPC at Extreme Scale\n\nLevy, DeBardeleben, Teranishi, Daly\n\nIncrease
 s in the number, variety and complexity of components required to compose 
 next-generation extreme-scale systems mean that systems will experience si
 gnificant increases in aggregate fault rates, fault diversity and the comp
 lexity of isolating root causes.  Additionally, the emergence of high-...\
 n\n---------------------\nFTXS – Break\n\nLevy\n\n---------------------\nC
 heckpointing OpenSHMEM Programs Using Compiler Analysis\n\nShahneous Bari,
  Basu, Lu, Curtis, Chapman\n\nThe importance of fault-tolerance continues 
 to increase for HPC applications. The continued growth in size and complex
 ity of HPC systems, and of the applications themselves, is leading to an i
 ncreased likelihood of failures during execution. Most HPC programming mod
 els, however, lack a built-in faul...\n\n---------------------\nA Generic 
 Strategy for Node-Failure Resilience for Certain Iterative Linear Algebra 
 Methods\n\nPachajoa, Ernstbrunner, Gansterer\n\nResilience is an important
  research topic in HPC. As computer clusters go to extreme scales, work in
  this area is necessary to keep these machines reliable.\n\nIn this work, 
 we introduce a generic method to endow iterative algorithms in linear alge
 bra based on sparse matrix-vector products, such as li...\n\n-------------
 --------\nFTXS – Closing Remarks\n\nLevy\n\n---------------------\nFrom Ta
 sks Graphs to Asynchronous Distributed Checkpointing with Local Restart\n\
 nLion, Thibault\n\nThe ever-increasing number of computation units assembl
 ed in current HPC platforms leads to a concerning increase in fault probab
 ility. Traditional checkpoint/restart strategies avoid wasting large amoun
 ts of computation time when such fault occurs. With the increasing amount 
 of data processed by to...\n\n---------------------\nModels for Resilience
  Design Patterns\n\nKumar, Engelmann\n\nResilience plays an important role
  in supercomputers by providing correct and efficient operation in case of
  faults, errors, and failures. Resilience design patterns offer blueprints
  for effectively applying resilience technologies. Prior work focused on d
 eveloping initial efficiency and performance...\n\n---------------------\n
 Towards Distributed Software Resilience in Asynchronous Many-Task Programm
 ing Models\n\nGupta, Mayo, Lemoine, Kaiser\n\nExceptions and errors occurr
 ing within mission critical applications due to hardware failures have a h
 igh cost. With the emerging next generation platforms (NGPs), the rate of 
 hardware failures will likely increase. Designing our applications to be r
 esilient, therefore, is a critical concern in orde...\n\n-----------------
 ----\nImproving Scalability of Silent-Error Resilience for Message-Passing
  Solvers via Local Recovery and Asynchrony\n\nKolla, Mayo, Teranishi, Arms
 trong\n\nBenefits of local recovery (restarting only a failed process or t
 ask) have been previously demonstrated in parallel solvers. Local recovery
  has a reduced impact on application performance due to masking of failure
  delays (for message-passing codes) or dynamic load balancing (for asynchr
 onous many-ta...\n\n\nTag: Extreme Scale Computing, Fault Tolerance, Relia
 bility and Resiliency\n\nRegistration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR

