SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

FTXS – Introduction: Workshop on Fault-Tolerance for HPC at Extreme Scale


Workshop:FTXS: Workshop on Fault-Tolerance for HPC at Extreme Scale

Authors: Scott Levy (Sandia National Laboratories), Nathan DeBardeleben (Los Alamos National Laboratory), Keita Teranishi (Sandia National Laboratories), and John Daly (University of Maryland)


Abstract: Increases in the number, variety and complexity of components required to compose next-generation extreme-scale systems mean that systems will experience significant increases in aggregate fault rates, fault diversity and the complexity of isolating root causes. Additionally, the emergence of high-bandwidth memory devices, the continued deployment of burst buffers and the development of near-threshold devices to address power concerns will all create fault tolerance challenges on new systems.

Due to the continued need for research on fault tolerance in extreme-scale systems, the 10th Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2020) will present an opportunity for innovative research ideas to be shared, discussed and evaluated by researchers in fault-tolerance, resilience and reliability from academic, government and industrial institutions. Building on the success of the previous editions of the FTXS workshop, the organizers will assemble quality publications, invited talks and keynotes to facilitate a lively and thought-provoking group discussion.


Website: https://sites.google.com/site/ftxsworkshop/home/ftxs-2020






Back to FTXS: Workshop on Fault-Tolerance for HPC at Extreme Scale Archive Listing



Back to Full Workshop Archive Listing