Authors: Kaiming Ouyang (University of California, Riverside); Min Si (Argonne National Laboratory (ANL)); Atsushi Hori (RIKEN Center for Computational Science (R-CCS)); Zizhong Chen (University of California, Riverside); and Pavan Balaji (Argonne National Laboratory (ANL))
Abstract: Load balance is essential for high-performance applications. Unbalanced communication can cause severe performance degradation, even in computation-balanced BSP applications. Designing communication-balanced applications is challenging, however, because of the diverse communication implementations at the underlying runtime system. In this paper, we address this challenge through an interprocess work-stealing scheme based on process-memory-sharing techniques. We present CAB-MPI, an MPI implementation that can identify idle processes inside MPI and use these idle resources to dynamically balance communication workload on the node. We design throughput-optimized strategies to ensure efficient stealing of the data movement tasks. We demonstrate the benefit of work-stealing through several internal processes in MPI, including intra-node data transfer, pack/unpack for noncontiguous communication and computation in one-sided accumulates. The implementation is evaluated through a set of microbenchmarks and proxy applications on Intel Xeon and Xeon Phi platforms.
Back to Technical Papers Archive Listing