Главная
Study mode:
on
1
Intro
2
Motivation: power oversubscription and capping
3
Motivation: task QoS differentiation
4
Prior industry solutions did not meet our needs
5
Architecture
6
Mechanism and policy details
7
Why not RAPL or DVFS?
8
CPU bandwidth control, DVFS, RAPL on Intel Skylake CPU
9
Reactive capping policy: load shaping
10
Load shaping on a production cluster
11
Proactive capping mechanism: CPU jailing Deterministic machine CPU cap
12
20% CPU jailing on a production cluster
13
Proactive capping policy: risk assessment
14
Deployed in logs processing clusters
15
Summary
Description:
Explore a conference talk on Thunderbolt, a hardware-agnostic power capping system designed for hyperscale data centers. Learn about the challenges of power oversubscription and the need for task-level quality-of-service differentiation in modern compute clusters. Discover how Thunderbolt ensures safe power oversubscription while minimizing impact on both throughput-oriented and latency-sensitive tasks. Examine the system's architecture, mechanisms, and policies, including its two-threshold control policy and use of CPU bandwidth control. Understand the benefits of Thunderbolt's reactive and proactive capping approaches, and see real-world deployment results in production clusters. Gain insights into power efficiency improvements and the potential for significant power oversubscription gains in data center environments.

Thunderbolt - Throughput-Optimized, Quality-of-Service-Aware Power Capping at Scale

USENIX
Add to list
0:00 / 0:00