Achieving the Performance of Global Adaptive Routing Using Local Information on Dragonfly through Deep Learning
TimeThursday, 19 November 20208:30am - 5pm EDT
DescriptionThe Universal Globally Adaptive Load-balance Routing (UGAL) with global information, referred as UGAL-G, represents an ideal form of adaptive routing on Dragonfly. UGAL-G is impractical to implement, however, since the global information cannot be maintained accurately. Practical adaptive routing schemes, such as UGAL with local information (UGAL-L), performs noticeably worse than UGAL-G. In this work, we investigate a machine learning approach for routing on Dragonfly. Specifically, we develop a machine learning-based routing scheme, called UGAL-ML, that is capable of making routing decisions like UGAL-G based only on the information local to each router. Our preliminary evaluation indicates that UGAL-ML can achieve comparable performance to UGAL-G for some traffic patterns.