SC20 Is Everywhere We Are

SC20 Virtual Platform
Recursive Basic Linear Algebra Operations on TensorCore GPU
Event Type
Workshop
Tags
Algorithms
Extreme Scale Computing
Performance/Productivity Measurement and Evaluation
Scalable Computing
Scientific Computing
Registration Categories
W
TimeThursday, 12 November 20203:45pm - 4:10pm EDT
LocationTrack 8
DescriptionEncouraged by the requirement of high speed matrix computations and training deep neural networks, TensorCore was introduced in NVIDIA GPU
to further accelerate matrix-matrix multiplication. It supports very fast half precision general matrix matrix multiplications (GEMMs), which is around 8x faster then single precision CUDA core GEMMs.
So far the use of TensorCore GPU for matrix operations other than
matrix-matrix multiplication is under developed.
In this paper, we propose efficient BLAS3 operations that exploits TensorCore. The experimental results show that the proposed algorithms outperform cublas corresponding routines and the naive TensorCore implementation with up to 4.7x speedup.
Back To Top Button