Online-Codistillation Meets LARS: Going beyond the Limit of Data Parallelism in Deep Learning

SC20 Proceedings

Online-Codistillation Meets LARS: Going beyond the Limit of Data Parallelism in Deep Learning

Workshop:The 5th Deep Learning on Supercomputers Workshop

Authors: Shogo Murai, Hiroaki Mikami, Masanori Koyama, Shuji Suzuki, and Takuya Akiba (Preferred Networks Inc)

Abstract: Data parallel training is a powerful family of methods for the efficient training of deep neural networks on big data. Unfortunately, however, recent studies have shown that the merit of increased batch-size in terms of both speed and model-performance diminishes rapidly beyond some point. This seems to apply even to LARS, the state-of-the-art large batch stochastic optimization method.

In this paper, we combine LARS with online-codistillation, a recently developed, efficient deep learning algorithm built on a whole different philosophy of stabilizing the training procedure using a collaborative ensemble of models. We show that the combination of large-batch training and online-codistillation is much more efficient than either one alone. We also present a novel way of implementing the online-codistillation that can further speed up the computation. We will demonstrate the efficacy of our approach on various benchmark datasets.

Back to The 5th Deep Learning on Supercomputers Workshop Archive Listing

Back to Full Workshop Archive Listing