Machine Learning for Data Transfer Anomaly Detection
TimeThursday, 19 November 20208:30am - 5pm EDT
DescriptionData transfer performance is critical for many science applications that rely on remote clusters to process the data. Despite the presence of high-speed research networks with up to 100 Gbps speeds, most data transfers obtain only a fraction of network bandwidth, due to a variety of reasons. This project aims to pinpoint the underlying causes for performance anomalies by collecting and processing real-time performance metrics from file systems, data transfer nodes and networks such that proper actions can be taken for timely mitigation of the issues. As veracity and velocity of performance statistics are beyond what human operators can handle, we trained a neural network (NN) model to analyze the data in real-time and make high-accuracy predictions. The results indicate that NN can find the correct anomaly type with 93% accuracy.