Alita: Comprehensive Performance Isolation through Bias Resource Management for Public Clouds
Machine Learning, Deep Learning and Artificial Intelligence
Performance/Productivity Measurement and Evaluation
Resource Management and Scheduling
TimeTuesday, 17 November 20203:30pm - 4pm EDT
DescriptionThe tenants of public clouds share hardware resources on the same node, resulting in the potential for performance interference (or malicious attacks). A tenant is able to degrade the performance of its neighbors on the same node significantly through overuse of the shared memory bus, last level cache (LLC)/memory bandwidth, and power.
To eliminate such unfairness we propose Alita, a runtime system consisting of an online interference identifier and adaptive interference eliminator. The interference identifier monitors hardware and system-level event statistics to identify resource polluters. The eliminator improves the performance of normal applications by throttling only the resource usage of polluters. Specifically, Alita adopts bus lock sparsification, bias LLC/bandwidth isolation and selective power throttling to throttle the resource usage of polluters. Results for an experimental platform and in-production cloud demonstrate that Alita significantly improves the performance of co-located virtual machines in the presence of resource polluters based on system-level knowledge.