BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160059Z
LOCATION:Track 5
DTSTART;TZID=America/New_York:20201118T130000
DTEND;TZID=America/New_York:20201118T133000
UID:submissions.supercomputing.org_SC20_sess162_pap241@linklings.com
SUMMARY:Cost-Aware Prediction of Uncorrected DRAM Errors in the Field
DESCRIPTION:Paper\n\nCost-Aware Prediction of Uncorrected DRAM Errors in t
 he Field\n\nBoixaderas, Zivanovic, Moré, Bartolome, Vicente...\n\nThis pap
 er presents and evaluates a method to predict DRAM uncorrected errors, a l
 eading cause of hardware failures in large-scale HPC clusters. The method 
 uses a random forest classifier, which was trained and evaluated using err
 or logs from two years of production of the MareNostrum 3 supercomputer. B
 y enabling the system to take measures to mitigate node failures, our meth
 od reduces lost compute time by up to 57%, a net saving of 21,000 node hou
 rs per year. We release all source code as open source.\n\nWe also discuss
  and clarify aspects of methodology that are essential for a DRAM predicti
 on method to be useful in practice. We explain why standard evaluation met
 rics, such as precision and recall, are insufficient, and base the evaluat
 ion on a cost–benefit analysis. This methodology can help ensure that any 
 DRAM error predictor is clear from training bias and has a clear cost–bene
 fit calculation.\n\nTag: Machine Learning, Deep Learning and Artificial In
 telligence, Requirements, Performance, and Benchmarks, Reliability and Res
 iliency\n\nRegistration Category: Tech Program Reg Pass
END:VEVENT
END:VCALENDAR

