Authors: Muhammed Emin Ozturk (Ohio State University, University of Utah) and Wei Wang, Maciej Szankin, and Lei Shao (Intel Corporation)
Abstract: Distributed computing has become a key component in the field of data science, allowing for faster prototyping and accelerated time to market of numerous workloads. This work examines the distributed training performance of BERT, a state-of-the-art language model for neural language processing (NLP), in the tasks of pre-training and fine-tuning on general-purpose Intel CPUs. The effects using Intel-optimized TensorFlow optimizations on Intel Architecture with both FP32 and BFLOAT16 floating-point formats are included in the analysis. Results show that the distributed TensorFlow BERT model with LAMB optimizer can maintain high accuracy while getting good performance speedups from scaling to a larger amount of Intel Xeon CPUs.
Best Poster Finalist (BP): no
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing