SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Analyzing Interconnect Congestion on a Production Dragonfly-Based System

Authors: Joy Kitson (University of Maryland, Argonne National Laboratory (ANL)); Sudheer Chunduri (Argonne National Laboratory (ANL)); and Abhinav Bhatele (University of Maryland)

Abstract: As the HPC community continues along the road to exascale, and HPC systems grow ever larger and busier, the question of how network traffic on these systems affects application performance looms large. In order to fully address this question, the HPC community needs a broadened understanding of the behavior of traffic on production systems. We present an analysis of communications traffic on the Theta cluster at Argonne Leadership Computing Facility (ALCF), with a focus on how congestion is distributed in both space and time across the system.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

