SC20 Is Everywhere We Are

SC20 Virtual Platform
Tools and Best Practices for Distributed Deep Learning on Supercomputer
Event Type
Tutorial
Tags
Best Practices
Big Data
Machine Learning, Deep Learning and Artificial Intelligence
Registration Categories
TUT
TimeMonday, 9 November 20202:30pm - 6:30pm EST
LocationTrack 7
DescriptionThis tutorial is a practical guide on how to effectively run distributed deep learning over multiple compute nodes. Domain scientists are embracing DL as both a standalone data science method and an effective approach to reducing dimensionality in the traditional simulation. We have seen the fusion of DL and high-performance computing (HPC): supercomputers show an unparalleled capacity to reduce DL training time; HPC techniques have been used to speed up parallel DL training. Distributed deep learning has great potential to augment DL applications by leveraging existing high-performance computing clusters. In this tutorial, we will give an overview of the state-of-art approaches to enabling deep learning at scale followed by an interactive hands-on session to help attendees running distributed deep learning on Frontera at the Texas Advanced Computing Center. Lastly, we will discuss best practices on how to scale, evaluate and tune performance.
Back To Top Button