Главная
Study mode:
on
1
Intro
2
Inference Stack Evolution PYTORCH
3
Model explanation, model pre-post transformers
4
GPU Autoscaling the challenge
5
Challenge: Increase GPU utilization
6
Use Case: Personalized News Monitoring
7
Challenge: Deploy many models
8
Proposed Solution: Multi-model Inference Service
9
Experience from running a serverless inference platform
10
Reduce tail latency caused by CPU throttling
11
Reduce cold start latency
12
Monitoring and Alerting: Control Plane
13
Monitoring and Alerting: Access logs
14
Monitoring and Alerting: Inference Service metrics
15
KFServing Roadmap 2020
16
Our Working Group is Open
Description:
Explore serverless machine learning inference with KFServing in this conference talk. Learn how KFServing leverages KNative to simplify deployment of standard and custom ML models, enabling auto-scaling and scale-to-zero functionality. Discover the challenges of integrating serverless technology with ML inference, particularly in latency-critical environments. Gain insights into the evolution of inference stacks, GPU autoscaling challenges, and strategies to improve GPU utilization. Examine a real-world use case of personalized news monitoring and the proposed multi-model inference service solution. Delve into practical experiences of running a serverless inference platform, including techniques to reduce tail latency and cold start latency. Understand the importance of monitoring and alerting for control plane, access logs, and inference service metrics. Get a glimpse of the KFServing roadmap for 2020 and learn about the open working group driving its development.

Serverless Machine Learning Inference with KFServing

CNCF [Cloud Native Computing Foundation]
Add to list
0:00 / 0:00