SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Machine Learning for Data Transfer Anomaly Detection

Authors: Masud Bhuiyan, Sarah Cooper, and Engin Arslan (University of Nevada, Reno)

Abstract: Data transfer performance is critical for many science applications that rely on remote clusters to process the data. Despite the presence of high-speed research networks with up to 100 Gbps speeds, most data transfers obtain only a fraction of network bandwidth, due to a variety of reasons. This project aims to pinpoint the underlying causes for performance anomalies by collecting and processing real-time performance metrics from file systems, data transfer nodes and networks such that proper actions can be taken for timely mitigation of the issues. As veracity and velocity of performance statistics are beyond what human operators can handle, we trained a neural network (NN) model to analyze the data in real-time and make high-accuracy predictions. The results indicate that NN can find the correct anomaly type with 93% accuracy.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing