Distributed BERT Pre-Training And Fine-Tuning with Intel-Optimized TensorFlow On Intel Xeon Scalable Processors
TimeThursday, 19 November 20208:30am - 5pm EDT
DescriptionDistributed computing has become a key component in the field of data science, allowing for faster prototyping and accelerated time to market of numerous workloads. This work examines the distributed training performance of BERT, a state-of-the-art language model for neural language processing (NLP), in the tasks of pre-training and fine-tuning on general-purpose Intel CPUs. The effects using Intel-optimized TensorFlow optimizations on Intel Architecture with both FP32 and BFLOAT16 floating-point formats are included in the analysis. Results show that the distributed TensorFlow BERT model with LAMB optimizer can maintain high accuracy while getting good performance speedups from scaling to a larger amount of Intel Xeon CPUs.