BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160052Z
LOCATION:Track 4
DTSTART;TZID=America/New_York:20201117T153000
DTEND;TZID=America/New_York:20201117T160000
UID:submissions.supercomputing.org_SC20_sess165_pap537@linklings.com
SUMMARY:GPU Lifetimes on Titan Supercomputer: Survival Analysis and Reliab
 ility
DESCRIPTION:Paper\n\nGPU Lifetimes on Titan Supercomputer: Survival Analys
 is and Reliability\n\nOstrouchov, Maxwell, Ashraf, Engelmann, Shankar...\n
 \nThe Cray XK7 Titan was the top supercomputer system in the world for a l
 ong time and remained critically important throughout its nearly seven-yea
 r life. It was an interesting machine from a reliability viewpoint as most
  of its power came from 18,688 GPUs whose operation was forced to execute 
 three rework cycles, two on the GPU mechanical assembly and one on the GPU
  circuitboards. We write about the last rework cycle and a reliability ana
 lysis of over 100,000 years of GPU lifetimes during Titan’s 6-year-long pr
 oductive period. Using time between failures analysis and statistical surv
 ival analysis techniques, we find that GPU reliability is dependent on hea
 t dissipation to an extent that strongly correlates with detailed nuances 
 of the cooling architecture and job scheduling. We describe the history, d
 ata collection, cleaning and analysis and give recommendations for future 
 supercomputing systems. We make the data and our analysis codes publicly a
 vailable.\n\nTag: Accelerators, FPGA, and GPUs, Reliability and Resiliency
 , Security, System Software and Runtime Systems\n\nRegistration Category: 
 Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

