Authors: Aril B. Ovesen B. Ovesen (University of Tromsø – The Arctic University of Norway); Amin M. Khan (Superior Técnico, University of Lisbon); and Phuong Ngoc Chau and Phuong Hoai Ha (University of Tromsø – The Arctic University of Norway)
Abstract: The ever growing scale of the data and the emerging challenges of processing it efficiently at that scale are driving the need for high-performance data processing frameworks that
are efficient yet highly programmable across big data analytics (BDA) and high-performance computing (HPC) communities. We introduce SaddlebagX, a new data-centric framework that
offers the high programmability of BDA frameworks, such as the simple BSP programming model, while striving for high performance by leveraging the Partitioned Globally Address Space (PGAS) computing paradigm and the Remote Memory Access (RMA) capability provided in HPC frameworks. Graph analytics and non-iterative bulk transformation benchmarks using SaddlebagX show significant performance gains (up to 40×) compared to the Apache Spark counterparts. To evaluate the overheads incurred by SaddlebagX, we compare SaddlebagX with UPC++ using a basic sparse matrix vector multiplication (SpMV) benchmark, and observe no noticeable slowdown. These results show that SaddlebagX is a high-performance
data processing framework.
Best Poster Finalist (BP): no
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing