Главная
Study mode:
on
1
Introduction
2
Background about KServe
3
Milestones
4
Model Development
5
Challenges
6
KServe
7
KServe Components
8
Standard Inference Protocol
9
HTTP Protocol
10
GRPC Protocol
11
New Scalability Problem
12
Current Approach
13
Problem
14
Compute resource limitations
15
Maximum pod limitations
16
Maximum IP address limitations
17
Model Mesh Solution
18
Performance Test
19
Latency Test
20
Model Mesh
21
Roadmap
22
Questions
23
Original Design
Description:
Explore the scalable deployment of machine learning models using KServe in this conference talk. Learn about the Multi-Model Serving solution designed to address limitations in the 'one model, one service' paradigm, including resource constraints, pod limitations, and IP address restrictions. Discover how KServe enables efficient GPU utilization for multiple models, and gain insights into its components, standard inference protocols, and performance benchmarks. Understand the evolution from KFServing to KServe, the challenges in model development, and the roadmap for future improvements. Dive into the design of Multi-Model Serving and its implementation across different frameworks, showcasing its potential to revolutionize machine learning model deployment at scale.

Serving Machine Learning Models at Scale Using KServe

CNCF [Cloud Native Computing Foundation]
Add to list