CPU bandwidth control, DVFS, RAPL on Intel Skylake CPU
9
Reactive capping policy: load shaping
10
Load shaping on a production cluster
11
Proactive capping mechanism: CPU jailing Deterministic machine CPU cap
12
20% CPU jailing on a production cluster
13
Proactive capping policy: risk assessment
14
Deployed in logs processing clusters
15
Summary
Description:
Explore a conference talk on Thunderbolt, a hardware-agnostic power capping system designed for hyperscale data centers. Learn about the challenges of power oversubscription and the need for task-level quality-of-service differentiation in modern compute clusters. Discover how Thunderbolt ensures safe power oversubscription while minimizing impact on both throughput-oriented and latency-sensitive tasks. Examine the system's architecture, mechanisms, and policies, including its two-threshold control policy and use of CPU bandwidth control. Understand the benefits of Thunderbolt's reactive and proactive capping approaches, and see real-world deployment results in production clusters. Gain insights into power efficiency improvements and the potential for significant power oversubscription gains in data center environments.
Thunderbolt - Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale