TOSS-2020: A Commodity Software Stack for HPC
SessionSystem Software at Scale
Event Type
Paper
Accelerators, FPGA, and GPUs
Reliability and Resiliency
Security
System Software and Runtime Systems
TP
TimeTuesday, 17 November 20203pm - 3:30pm EDT
LocationTrack 4
DescriptionThe simulation environment of any HPC platform is key to the performance, portability and productivity of scientific applications. This environment has traditionally been provided by platform vendors, presenting challenges for HPC centers and users; including platform-specific software that tends to stagnate over the lifetime of the system. In this paper, we present the Tri-Laboratory Operating System Stack (TOSS), a production simulation environment based on Linux and open source software, with proprietary software components integrated as needed. TOSS, focused on mid-to-large scale commodity HPC systems, provides a common simulation environment across system architectures, reduces the learning curve on new systems and benefits from a lineage of past experience and bug fixes. To further the scope and applicability of TOSS, we demonstrate its feasibility and effectiveness on a leadership-class supercomputer architecture. Our evaluation, relative to the vendor stack, includes an analysis of resource manager complexity, system noise, networking and application performance.
Download PDF





