Distributed BERT Pre-Training And Fine-Tuning with Intel-Optimized TensorFlow On Intel Xeon Scalable Processors

SC20 Proceedings

Distributed BERT Pre-Training And Fine-Tuning with Intel-Optimized TensorFlow On Intel Xeon Scalable Processors

Authors: Muhammed Emin Ozturk (Ohio State University, University of Utah) and Wei Wang, Maciej Szankin, and Lei Shao (Intel Corporation)

Abstract: Distributed computing has become a key component in the field of data science, allowing for faster prototyping and accelerated time to market of numerous workloads. This work examines the distributed training performance of BERT, a state-of-the-art language model for neural language processing (NLP), in the tasks of pre-training and fine-tuning on general-purpose Intel CPUs. The effects using Intel-optimized TensorFlow optimizations on Intel Architecture with both FP32 and BFLOAT16 floating-point formats are included in the analysis. Results show that the distributed TensorFlow BERT model with LAMB optimizer can maintain high accuracy while getting good performance speedups from scaling to a larger amount of Intel Xeon CPUs.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing