SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Time-Based Roofline for Deep Learning Performance Analysis


Workshop:The 5th Deep Learning on Supercomputers Workshop

Authors: Yunsong Wang, Charlene Yang, Steven Farrell, and Yang Zhang (Lawrence Berkeley National Laboratory); Thorsten Kurth (Nvidia Corporation); and Samuel Williams (Lawrence Berkeley National Laboratory)


Abstract: Deep learning applications based on neural networks are generating considerable interest in various fields due to their high accuracy. Such an application is usually very compute-intensive and thus requires a long run time. Researchers and engineers are actively exploring new solutions to this issue from both hardware and software/algorithm sides. Little previous work, however, has focused on providing a practical methodology to characterize deep learning performance bottlenecks and potentially guide the following optimization efforts. In this paper, we introduce an extension of the Roofline model and use it to analyze two representative computation kernels in deep learning, 2D convolution and long short-term memory, on NVIDIA GPUs. This new time-based Roofline model incorporates both compute/bandwidth complexity and run time in its formulae to demonstrate performance issues that cannot be reflected by the classic Roofline. Factors such as arithmetic intensity, data transfer, kernel launch overhead and tensor core usage will be examined by varying different parameters such as batch size and feature size, etc. This work helped form a more systematic way to understand the performance issue of deep learning applications. Last but not least, this generic performance model can be applied to a wide category of applications besides deep learning.





Back to The 5th Deep Learning on Supercomputers Workshop Archive Listing



Back to Full Workshop Archive Listing