SC20 Is Everywhere We Are

Virtual Event FAQ
SEFEE: Lightweight Storage Error Forecasting in Large-Scale Enterprise Storage Systems
Event Type
Paper
Tags
Fault Tolerance
Reliability and Resiliency
Storage
Registration Categories
TP
TimeWednesday, 18 November 20201pm - 1:30pm EDT
LocationTrack 2
DescriptionWith the rapid growth in scale and complexity, today's enterprise storage systems need to deal with significant amounts of errors. Existing proactive methods mainly focus on machine learning techniques trained on SMART measurements. Such methods, however, are usually expensive to use in practice and can only be applied to limited types of errors with a limited scale. We collected more than 23 million storage events from 87 deployed NetApp-ONTAP systems managing 14,371 disks for two years, and propose a lightweight training-free storage error forecasting method; SEFEE. SEFEE employs tensor decomposition to directly analyze storage error-event logs and perform online error prediction for all error types in all storage nodes. SEFEE explores hidden spatiotemporal information that is deeply embedded in the global scale of storage systems to achieve record breaking error forecasting accuracy with minimal prediction overhead.
Back To Top Button