Главная
Study mode:
on
1
Introduction
2
Introductions
3
What is CERN
4
Motivation for our service
5
Reconstruction
6
Simulations
7
Goals
8
Platform
9
Cluster Layout
10
Deployment
11
Integrations
12
Issues
13
Burst to Public Clouds
14
Automating Distributed Training
15
Service Dashboard
16
Demo
17
Submitting jobs
18
Results
19
Closing remarks
Description:
Explore the journey of building and managing a centralized machine learning platform using Kubeflow at CERN in this 31-minute conference talk. Discover how CERN leverages ML solutions for various challenges, including particle classification, simulation data generation, and beam calibration. Learn about the recently introduced centralized service that handles data preparation, model training, and serving while optimizing resource usage for different types of accelerators. Gain insights into CERN's experience with Kubeflow on Kubernetes, their integration of on-premises resources, and potential extensions to public clouds. Delve into topics such as cluster layout, deployment strategies, integrations, and automation of distributed training. Witness a demo of job submission and results, and understand the motivations behind CERN's ML platform development.

Building and Managing a Centralized ML Platform with Kubeflow at CERN

CNCF [Cloud Native Computing Foundation]
Add to list