· Contributors · Organizations · Search
Toward Automated Kernel Fusion for the Optimization of Scientific Applications
DescriptionWe introduce a novel transformation pass written using LLVM that performs kernel fusion. We demonstrate the correctness and performance of the pass on several example programs inspired by scientific applications of interest. The method achieves up to 4x speedup relative to unfused versions of the programs, and exact performance parity with manually fused versions. In contrast to previous work, it also requires minimal user intervention. Our approach is facilitated by a new loop fusion algorithm capable of interprocedurally fusing both skewed and unskewed loops in different kernels.