Exploring the Acceleration of Nekbone on Reconfigurable Architectures
Accelerators, FPGA, and GPUs
TimeFriday, 13 November 202011:05am - 11:35am EDT
DescriptionHardware technological advances are struggling to match scientific ambition, and a key question is how we can use the transistors that we already have more effectively. This is especially true for HPC, where the tendency is often to throw computation at a problem whereas codes themselves are commonly bound, at-least to some extent, by other factors. By redesigning an algorithm and moving from a Von Neumann to a dataflow style, there is potentially more opportunity to address these bottlenecks on reconfigurable architectures, compared to more general-purpose architectures.
In this paper we explore the porting of Nekbone’s AX kernel, a widely popular HPC mini-app, to FPGAs using high level synthesis via Vitis. While computation is an important part of this code, it is also memory bound on CPUs, and a key question is whether one can ameliorate this by leveraging FPGAs. We first explore optimization strategies for obtaining good performance, with over a 4000x runtime difference between the first and final version of our kernel on FPGAs. Subsequently, performance and energy efficiency of our approach on an Alveo U280 are compared against a 240-core Xeon Platinum CPU and NVIDIA V100 GPU, with the FPGA outperforming the CPU by around 4x, achieving almost three quarters the GPU performance, and displaying significantly more energy efficiency than both. The result of this work is a comparison and a set of techniques that apply to Nekbone on FPGAs specifically and are also of interest more widely in accelerating HPC codes on reconfigurable architectures.