Analyzing Interconnect Congestion on a Production Dragonfly-Based System
TimeThursday, 19 November 20208:30am - 5pm EST
DescriptionAs the HPC community continues along the road to exascale, and HPC systems grow ever larger and busier, the question of how network traffic on these systems affects application performance looms large. In order to fully address this question, the HPC community needs a broadened understanding of the behavior of traffic on production systems. We present an analysis of communications traffic on the Theta cluster at Argonne Leadership Computing Facility (ALCF), with a focus on how congestion is distributed in both space and time across the system.