Workshop:H2RC 2020: Sixth International Workshop on Heterogeneous High-Performance Reconfigurable Computing
Authors: Norihisa Fujita, Ryohei Kobayashi, Yoshiki Yamaguchi, Taisuke Boku, Kohji Yoshikawa, Makito Abe, and Masayuki Umemura (University of Tsukuba)
Abstract: We have optimized the Authentic Radiative Transfer (ART) method to solve space radiative transfer problems in early universe astrophysical simulation on Intel Arria 10 FPGAs as earlier work. In this paper, we optimize it for the latest FPGA -- Intel Stratix 10 and evaluate its performance comparing with GPU implementation on multiple nodes. For the multi-FPGA computing and communication framework, we apply our original system named Communication Integrated Reconfigurable CompUting System (CIRCUS) to realize OpenCL base programming to utilize multiple optical links on FPGA for parallel FPGA processing, and this is the first implementation of real application over CIRCUS.
The FPGA implementation is 4.54 times, 8.41 times, and 10.64 times faster than that of GPU on 1 node, 2 nodes, and 4 nodes, respectively, for multi-GPU cases with InfiniBand HDR100 network. It also achieves 94.2% parallel efficiency running on 4 FPGAs. We believe this efficiency is brought from CIRCUS's low-latency and high-efficiency pipelined communication which provides easy programming on multi-FPGA by OpenCL for high performance computing applications.