GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
Accelerators, FPGA, and GPUs
Machine Learning, Deep Learning and Artificial Intelligence
TimeWednesday, 18 November 20204pm - 4:30pm EST
DescriptionThe acceleration of Graph Neural Networks (GNNs) requires efficient and framework-compatible Sparse-Dense Matrix-Matrix Multiplication (SpMM). From the compatibility perspective, the sophisticated sparse matrix representations in state-of-the-art SpMM designs cause heavy preprocessing overhead for the framework. From the efficiency perspective, optimizations for Sparse Matrix-Vector (SpMV) do not apply well to SpMM, leading to redundant and uncoalesced global memory access. We propose GE-SpMM, which takes the CSR format consistent with GNN frameworks to enable integration without the format transformation overhead. We use coalesced row caching to ensure coalesced access to both sparse and dense data in the global memory. We use coarse-grained warp merging to reduce redundant data loading among GPU warps. Experiments on a real-world graph dataset demonstrate up to 1.41× speedup over Nvidia cuSPARSE and up to 1.81× over GraphBLAST. We embed GE-SpMM in GNN frameworks and get up to 3.67× speedup on popular GNN models like GCN and GraphSAGE.