Главная
Study mode:
on
1
Intro
2
4 things you can do for more reliable ML
3
ML on one machine
4
ML in production
5
What makes ML in prod interesting
6
What goes wrong?
7
4 things for more reliable ML
8
ML outages from the outside
9
Where changes happen: binaries
10
Where changes happen: configuration
11
Validating binary and config changes
12
Where changes happen: data
13
Validating data updates
14
Improving data integrity
15
Handling pipeline backlogs
Description:
Explore a comprehensive talk on enhancing machine learning reliability in production environments. Learn about common failure modes in large-scale ML systems and discover best practices for productionization. Gain insights into monitoring systems, protecting against human error, ensuring data integrity, and managing pipeline workloads efficiently. Understand the challenges of ML in production, including binary and configuration changes, data updates, and pipeline backlogs. Apply an outside-in approach to ML reliability, drawing from experiences with a large-scale ML production platform at Google.

Demystifying Machine Learning in Production - Reasoning about a Large-Scale ML Platform

USENIX
Add to list