SC20 Is Everywhere We Are

Virtual Event FAQ
Recovering Silent Data Corruption through Spatial Prediction
Event Type
ACM Student Research Competition: Graduate Poster
ACM Student Research Competition: Undergraduate Poster
Tags
Student Program
Registration Categories
TP
TimeWednesday, 18 November 20204:35pm - 4:46pm EDT
LocationTrack 8
DescriptionHigh-performance computing applications are central to advancement in many fields of science and engineering. Central to this advancement is the supposed reliability of the HPC system. However, as system size grows and hardware components are run with near-threshold voltages, transient upset events become more likely. Many works have explored the problem of detection of silent data corruption. Recovery is often left to checkpoint-restart or application-specific techniques. This poster explores the use of spatial similarity to recover from silent data corruption. We explore eight reconstruction methods and find that Linear Regression yields the best results with over 90% of Linear Regression’s corrections having less than 1% relative error.
Back To Top Button