Empirical Modeling of Spatially Diverging Performance
TimeThursday, 12 November 20204:30pm - 5pm EDT
DescriptionA common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not hold equally for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question run up to 20x and the application as a whole run up to 4.5x faster.