A Dataflow-Graph Partitioning Method for Training Large Deep Learning Models
Event Type
Workshop
System Software and Runtime Systems
W
TimeFriday, 13 November 202011:15am - 11:45am EDT
LocationTrack 7
DescriptionLarge Deep Neural Network (DNNs) models have substantial memory requirements to store the model parameters and intermediate results. As a result, the limited device memory becomes a bottleneck when training those models. We propose a deterministic, generic and efficient partitioning strategy for DNNs that are represented as computational graphs. The proposed partitioning algorithm decides a placement of a DNN’s underlying computational graph operations across multiple accelerators, so that the memory constraints of the devices are met and the training time is minimized. To the best of our knowledge, the strategy deployed in this work is the first that has absolute independence of the structure and operation types in DNN models. Therefore, it guarantees future compatibility and can be used with any type of emerging model, even if it has zero resemblance to the existing models in terms of structure or even the nature of the learning process and its operations. In this talk, I will be presenting the details of the method along with some performance data and comparison with related work.