Главная
Study mode:
on
1
Intro
2
Kubeflow Platform for ML
3
ML Workflow @ Spotify
4
Kubeflow clusters
5
Kubeflow platform stats
6
Our solutions
7
Team based multi-tenancy
8
Team profile management
9
Team profile example
10
Existing setup & problems
11
Multi-cluster strategy illustration
12
Benefits of multiple clusters
13
Multi-cluster implementation
14
Multi-cluster based Kubeflow Platforr
15
New challenges
16
Solution: ArgoCD
17
Multi-cluster CD
18
Multi-cluster reliability
19
"Infrastructure"-as-code
20
SLO Tracking
21
Telemetry and Metrics
22
kubeflow-state-metrics
23
Infrastructure and Infrastructui
24
Product Identity
25
Expanded Observability
26
Expanded "On-Cluster" Compute
27
Self-Service
28
Open Source!
Description:
Explore Spotify's journey in scaling Kubeflow for multi-tenancy in this conference talk. Learn how the company addressed challenges of increased adoption and complex machine learning experiments while ensuring cluster reliability and equitable resource access. Discover Spotify's streamlined tooling for maintaining, deploying, and monitoring their Kubeflow distribution. Gain insights into their multi-cluster approach, team-based multi-tenancy strategies, and implementation of infrastructure-as-code. Understand how they tackled new challenges using ArgoCD, improved observability, and expanded on-cluster compute capabilities. Delve into their focus on SLO tracking, telemetry, and metrics, as well as their efforts to enhance product identity and promote self-service. Get a glimpse of Spotify's future plans for their Kubeflow platform and their commitment to open-source contributions in the field of machine learning infrastructure.

Scaling Kubeflow for Multi-tenancy at Spotify

CNCF [Cloud Native Computing Foundation]
Add to list
0:00 / 0:00