Dhiraj is a research scientist in Intel's Parallel Computing Lab in Bangalore. His research interests include parallel computer architecture, GPGPU architectures, hardware specific single node and distributed performance scaling optimizations. Recently, he is working on analyzing and optimizing deep learning workloads, frameworks and libraries for Intel Xeon and GPU architectures. Dhiraj led the early efforts to demonstrate superiority of BFloat16 over INT16 for training DL workloads helping set directions for low-precision Xeon DL roadmap for future generations. In general, he has proven track record demonstrating how to realize best performance over Xeon processors. In past he has worked on optimizing various HPC workloads for Intel Xeon and MIC architecture and architecting security and privacy for Xeon processors. Dhiraj Joined Intel in 2006 as an RCG. Before joining Intel he has completed Bachelor of Engineering from Government engineering college, Aurangabad, Maharashtra, India and earned an M. Tech. from IIT Kanpur, India.
Accelerators, FPGA, and GPUs
Machine Learning, Deep Learning and Artificial Intelligence