Toward Modular Supercomputing: Resource Disaggregation and Virtualization by Network-Attached Accelerators
Education, Training and Outreach
TimeWednesday, 11 November 20204pm - 4:05pm EDT
DescriptionThis lightning talk presents the Network-Attached Accelerator (NAA) approach, which increases communication efficiency by moving accelerators into a massively-parallel “stand-alone” computing cluster. The novel architecture enables the direct communication between remote accelerators transparently through the Extoll high-speed network without any host interactions and an optimal application-to-compute-resources mapping by disaggregating the accelerators from their PCI Express hosts and virtualizing the hardware over the network. From a user’s perspective, the NAA software environment provides a transparent mapping between the remote accelerators and the cluster nodes by emulating the PCI Express subsystem for the underlying operating system. The architectural idea has been derived from the European Dynamical Exascale Entry Platform (DEEP) project series, which introduces the Cluster-Booster concept to enable a larger amount of codes exploiting the advantages of highly scalable systems, while improving the energy-efficiency and scalability of cluster computers.
Initial prototype implementations with different accelerators such as Intel Xeon Phi Coprocessors and NVIDIA GPGPUs of the Tesla generation show promising performance results for both micro- and application benchmarks. For example, the communication time between remote GPUs can be reduced by up to 47%. Furthermore, the Network-Attached Accelerator approach serves as the prototype compute module for the Modular Supercomputing Architecture (MSA), which extends and generalizes the Cluster-Booster concept. The idea is to connect compute modules (such as cluster, data analytics, booster, and quantum computing modules) with different hardware and performance characteristics with each other to create a single heterogeneous system.