SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Performance Characteristics of Virtualized GPUs for Deep Learning


Workshop:SuperCompCloud: 3rd International Workshop on Interoperability of Supercomputing and Cloud Technologies

Authors: Scott Michael, Scott Teige, Junjie Li, John Lowe, George Turner, and Robert Henschel (Indiana University)


Abstract: As deep learning techniques and algorithms become more and more common in scientific workflows, HPC centers are grappling with how best to provide GPU resources and support deep learning workloads. One novel method of deployment is to virtualize GPU resources allowing for multiple VM instances to have logically distinct virtual GPUs (vGPUs) on a shared physical GPU. There are many operational and performance implications to consider, however, before deploying a vGPU service in an HPC center. In this paper, we investigate the performance characteristics of vGPUs for both traditional HPC workloads and for deep learning training and inference workloads. Using NVIDIA's vDWS virtualization software, we perform a series of HPC and deep learning benchmarks on both non-virtualized (bare metal) and vGPUs of various sizes and configurations. We report on several of the challenges we discovered in deploying and operating a variety of virtualized instance sizes and configurations. We find that the overhead of virtualization on HPC workloads is generally less than 10%, and can vary considerably for deep learning, depending on the task.





Back to SuperCompCloud: 3rd International Workshop on Interoperability of Supercomputing and Cloud Technologies Archive Listing



Back to Full Workshop Archive Listing