Play all

Intro

Hardware for ML training is becoming highly specialized and heterogeneous!

How should we allocate heterogeneous resources?

Challenge 1: Heterogeneous performance

Challenge 2: Diverse scheduling objectives

Related work

Gavel: A new heterogeneity-aware cluster scheduler

Scheduling policies to be made heterogeneity-aware

Policies as optimization problems

Allocations (x) as time fractions

Effective throughput

Performance optimizations: space sharing and placement

How do we realize an optimal allocation?

Gavel's round-based scheduling

Main questions

Gavel improves objectives on a heterogeneous cluster

Gavel can enable the same heterogeneous cluster to support higher input load

Gavel can support hierarchical policies

Gavel scales to clusters with hundreds of active jobs

Conclusion

Description:

Explore a 20-minute conference talk from OSDI '20 that delves into Gavel, a novel heterogeneity-aware scheduler for deep learning workloads. Learn how Gavel addresses the challenges of heterogeneous performance across specialized accelerators and diverse scheduling objectives in cluster management. Discover the concept of effective throughput and how it's used to transform existing scheduling policies into heterogeneity-aware versions. Understand Gavel's round-based scheduling mechanism and its ability to optimize resource allocation in heterogeneous clusters. Examine the performance improvements Gavel offers, including higher input load sustainability and significant enhancements in makespan and average job completion time compared to heterogeneity-agnostic policies.

Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

USENIX

Add to list

#Conference Talks #OSDI (Operating Systems Design and Implementation) #Mathematics #Optimization Problems