SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

SaddlebagX: High Performance Data Processing with PGAS and UPC++

Authors: Aril B. Ovesen B. Ovesen (University of Tromsø – The Arctic University of Norway); Amin M. Khan (Superior Técnico, University of Lisbon); and Phuong Ngoc Chau and Phuong Hoai Ha (University of Tromsø – The Arctic University of Norway)

Abstract: The ever growing scale of the data and the emerging challenges of processing it efficiently at that scale are driving the need for high-performance data processing frameworks that are efficient yet highly programmable across big data analytics (BDA) and high-performance computing (HPC) communities. We introduce SaddlebagX, a new data-centric framework that offers the high programmability of BDA frameworks, such as the simple BSP programming model, while striving for high performance by leveraging the Partitioned Globally Address Space (PGAS) computing paradigm and the Remote Memory Access (RMA) capability provided in HPC frameworks. Graph analytics and non-iterative bulk transformation benchmarks using SaddlebagX show significant performance gains (up to 40×) compared to the Apache Spark counterparts. To evaluate the overheads incurred by SaddlebagX, we compare SaddlebagX with UPC++ using a basic sparse matrix vector multiplication (SpMV) benchmark, and observe no noticeable slowdown. These results show that SaddlebagX is a high-performance data processing framework.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing