Главная
Study mode:
on
1
Intro
2
Hardware for ML training is becoming highly specialized and heterogeneous!
3
How should we allocate heterogeneous resources?
4
Challenge 1: Heterogeneous performance
5
Challenge 2: Diverse scheduling objectives
6
Related work
7
Gavel: A new heterogeneity-aware cluster scheduler
8
Scheduling policies to be made heterogeneity-aware
9
Policies as optimization problems
10
Allocations (x) as time fractions
11
Effective throughput
12
Performance optimizations: space sharing and placement
13
How do we realize an optimal allocation?
14
Gavel's round-based scheduling
15
Main questions
16
Gavel improves objectives on a heterogeneous cluster
17
Gavel can enable the same heterogeneous cluster to support higher input load
18
Gavel can support hierarchical policies
19
Gavel scales to clusters with hundreds of active jobs
20
Conclusion
Description:
Explore a 20-minute conference talk from OSDI '20 that delves into Gavel, a novel heterogeneity-aware scheduler for deep learning workloads. Learn how Gavel addresses the challenges of heterogeneous performance across specialized accelerators and diverse scheduling objectives in cluster management. Discover the concept of effective throughput and how it's used to transform existing scheduling policies into heterogeneity-aware versions. Understand Gavel's round-based scheduling mechanism and its ability to optimize resource allocation in heterogeneous clusters. Examine the performance improvements Gavel offers, including higher input load sustainability and significant enhancements in makespan and average job completion time compared to heterogeneity-agnostic policies.

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

USENIX
Add to list