SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Achieving the Performance of Global Adaptive Routing Using Local Information on Dragonfly through Deep Learning

Authors: Ram Sharan Chaulagain and Fatema Tabassum Liza (Florida State University), Sudheer Chunduri (Argonne National Laboratory (ANL)), Xin Yuan (Florida State University), and Michael Lang (Los Alamos National Laboratory)

Abstract: The Universal Globally Adaptive Load-balance Routing (UGAL) with global information, referred as UGAL-G, represents an ideal form of adaptive routing on Dragonfly. UGAL-G is impractical to implement, however, since the global information cannot be maintained accurately. Practical adaptive routing schemes, such as UGAL with local information (UGAL-L), performs noticeably worse than UGAL-G. In this work, we investigate a machine learning approach for routing on Dragonfly. Specifically, we develop a machine learning-based routing scheme, called UGAL-ML, that is capable of making routing decisions like UGAL-G based only on the information local to each router. Our preliminary evaluation indicates that UGAL-ML can achieve comparable performance to UGAL-G for some traffic patterns.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing