SaddlebagX: High Performance Data Processing with PGAS and UPC++
TimeThursday, 19 November 20208:30am - 5pm EDT
DescriptionThe ever growing scale of the data and the emerging challenges of processing it efficiently at that scale are driving the need for high-performance data processing frameworks that
are efficient yet highly programmable across big data analytics (BDA) and high-performance computing (HPC) communities. We introduce SaddlebagX, a new data-centric framework that
offers the high programmability of BDA frameworks, such as the simple BSP programming model, while striving for high performance by leveraging the Partitioned Globally Address Space (PGAS) computing paradigm and the Remote Memory Access (RMA) capability provided in HPC frameworks. Graph analytics and non-iterative bulk transformation benchmarks using SaddlebagX show significant performance gains (up to 40×) compared to the Apache Spark counterparts. To evaluate the overheads incurred by SaddlebagX, we compare SaddlebagX with UPC++ using a basic sparse matrix vector multiplication (SpMV) benchmark, and observe no noticeable slowdown. These results show that SaddlebagX is a high-performance
data processing framework.