BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160553Z
LOCATION:Track 11
DTSTART;TZID=America/New_York:20201111T122500
DTEND;TZID=America/New_York:20201111T125500
UID:submissions.supercomputing.org_SC20_sess204_ws_ftxs107@linklings.com
SUMMARY:A Generic Strategy for Node-Failure Resilience for Certain Iterati
 ve Linear Algebra Methods
DESCRIPTION:Workshop\n\nA Generic Strategy for Node-Failure Resilience for
  Certain Iterative Linear Algebra Methods\n\nPachajoa, Ernstbrunner, Ganst
 erer\n\nResilience is an important research topic in HPC. As computer clus
 ters go to extreme scales, work in this area is necessary to keep these ma
 chines reliable.\n\nIn this work, we introduce a generic method to endow i
 terative algorithms in linear algebra based on sparse matrix-vector produc
 ts, such as linear system solvers, eigensolvers, with resilience to node f
 ailures. This generic method traverses the dependency graph of the variabl
 es of the iterative algorithm. If the iterative method exhibits certain pr
 operties, it is possible to produce an exact state reconstruction (ESR) al
 gorithm, enabling the recovery of the state of the iterative method in the
  event of a node failure. This reconstruction is exact, except for small p
 erturbations caused by floating point arithmetic. The generic method explo
 its redundancy in the matrix-vector product to protect the vector that is 
 the argument of the product.\n\nWe illustrate the use of this generic appr
 oach on three iterative methods: the conjugate gradient method, the BiCGSt
 ab method and the Lanczos algorithm. The resulting ESR algorithms enable t
 he reconstruction of their state after a node failure from a few redundant
 ly stored vectors.\n\nUnlike previous work in preconditioned conjugate gra
 dient, this generic method produces ESR algorithms that work with general 
 matrices. Consequently, we can no longer assume that local diagonal submat
 rices used to reconstruct vectors are non-singular. Thus, we also propose 
 an approach for deriving non-singular local linear systems for the reconst
 ruction process with reduced condition numbers, based on a communication-a
 voiding rank-revealing QR factorization with column pivoting.\n\nTag: Extr
 eme Scale Computing, Fault Tolerance, Reliability and Resiliency\n\nRegist
 ration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR

