SC20 Is Everywhere We Are

SC20 Virtual Platform
ScalAna: Automating Scaling Loss Detection with Graph Analysis
SessionTools
Event Type
Paper
Tags
Graph Algorithms
Performance/Productivity Measurement and Evaluation
Registration Categories
TP
TimeTuesday, 17 November 20201pm - 1:30pm EST
LocationTrack 3
DescriptionScaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law and resource contention. Performance analysis tools for finding such scaling bottlenecks are based on either profiling or tracing. Profiling incurs low overheads but does not capture detailed dependencies needed for root-cause analysis. Tracing collects all information at prohibitive overheads.

In this work, we design ScalAna that uses static analysis techniques to enable the analyzability of traces at a cost similar to profiling. ScalAna first leverages static compiler techniques and runtime lightweight techniques to build a Program Performance Graph. With this graph, we propose a novel backtracking algorithm to automatically detect the root causes. We evaluate ScalAna with real applications. Results show that ScalAna can effectively locate the root causes and incurs 1.73% overhead on average for up to 2048 processes. We achieve up to 11.11% performance improvement on 2048 processes by fixing the root causes.
Back To Top Button