FTXS – Introduction: Workshop on Fault-Tolerance for HPC at Extreme Scale
Reliability and Resiliency
TimeWednesday, 11 November 202010am - 10:05am EST
DescriptionIncreases in the number, variety and complexity of components required to compose next-generation extreme-scale systems mean that systems will experience significant increases in aggregate fault rates, fault diversity and the complexity of isolating root causes. Additionally, the emergence of high-bandwidth memory devices, the continued deployment of burst buffers and the development of near-threshold devices to address power concerns will all create fault tolerance challenges on new systems.
Due to the continued need for research on fault tolerance in extreme-scale systems, the 10th Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS 2020) will present an opportunity for innovative research ideas to be shared, discussed and evaluated by researchers in fault-tolerance, resilience and reliability from academic, government and industrial institutions. Building on the success of the previous editions of the FTXS workshop, the organizers will assemble quality publications, invited talks and keynotes to facilitate a lively and thought-provoking group discussion.