SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

DRCCTPROF: A Fine-Grained Call Path Profiler for ARM-Based Clusters

Authors: Qidong Zhao (William & Mary), Xu Liu (North Carolina State University), and Milind Chabbi (Scalable Machines Research)

Abstract: ARM is an attractive CPU architecture for exascale systems because of its energy efficiency. As a recent entry into the HPC paradigm, ARM lags in its software stack, especially in the performance tooling aspect. Notably, there is a lack of fine-grained measurement tools to analyze fully-optimized HPC binary executables on ARM processors. In this paper, we introduce DRCCTPROF; a fine-grained call path profiling framework for binaries running on ARM architectures. The unique ability of DRCCTPROF is that it obtains full calling context at any and every machine instruction that executes, which provides detailed diagnostic feedback for performance optimization and correctness tools. Furthermore, DRCCTPROF not only associates any instruction with source code along the call path, it also associates memory access instructions back to the constituent data object. Finally, DRCCTPROF incurs moderate overhead and provides a compact view to visualize the profiles collected from parallel executions.

Back to Technical Papers Archive Listing