Authors: Troels Henriksen and Sune Hellfritzsch (University of Copenhagen), Ponnuswamy Sadayappan (University of Utah), and Cosmin Oancea (Copenhagen University)
Abstract: We present and evaluate an implementation technique for histogram-like computations on GPUs that ensures both work-efficient asymptotic cost, support for arbitrary associative and commutative operators and efficient use of hardware-supported atomic operations, when applicable. Based on a systematic empirical examination of the design space, we develop a technique that balances conflict rates and memory footprint.
We demonstrate our technique both as a library implementation in CUDA, as well as by extending the parallel array language Futhark with a new construct for expressing generalized histograms, and by supporting this construct with several compiler optimizations. We show that our histogram implementation taken in isolation outperforms similar primitives from CUB, and that it is competitive or outperforms the hand-written code of several application benchmarks, even when the latter is specialized for a class of datasets.
Back to Technical Papers Archive Listing