Workshop:ROSS 2020: 10th International Workshop on Runtime and Operating Systems for Supercomputers
Authors: Edgar A. Leon (Lawrence Livermore National Laboratory), Balazs Gerofi (RIKEN), Julien Jaeger (Atomic Energy and Alternative Energies Commission (CEA)), Guillaume Mercier (Bordeaux INP), Rolf Riesen (Intel Corporation), Masamichi Takagi (RIKEN), and Brice Goglin (French Institute for Research in Computer Science and Automation (INRIA))
Abstract: Emerging workloads on supercomputing platforms are pushing the limits of traditional high-performance computing software environments. Multi-physics, coupled simulations, big data processing and machine learning frameworks, and multi-component workloads pose serious challenges to system and application developers. At the heart of the problem is the lack of cross-stack coordination to enable flexible resource management among multiple runtime components.
In this work, we analyze seven real-world applications that represent emerging workloads and illustrate the scope and magnitude of the problem. We then extract several themes from these applications that highlight next-generation requirements for node resource managers. Finally, using these requirements, we propose a general, cross-stack coordination framework and outline its components and functionality.