SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Optimization of Tensor-Product Operations in Nekbone on GPUs


Authors: Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, and Stefano Markidis (KTH Royal Institute of Technology)

Abstract: In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and further optimize the main tensor-product operation in Nekbone. Our optimization is done in CUDA and uses a different (2D) thread structure to make the computations layer by layer. The results show that our implementation outperforms previous GPU Nekbone implementations by 6% to 10% on Pascal and Volta GPU architectures. Compared to a measured roofline, we obtain 77% to 92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024 to 4096 elements and polynomial degree 9. In this poster we discuss our findings and bring up future considerations as we move toward exascale CFD simulations.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF


Back to Poster Archive Listing