Play all

Introduction

Background about KServe

Milestones

Model Development

Challenges

KServe

KServe Components

Standard Inference Protocol

HTTP Protocol

GRPC Protocol

New Scalability Problem

Current Approach

Problem

Compute resource limitations

Maximum pod limitations

Maximum IP address limitations

Model Mesh Solution

Performance Test

Latency Test

Model Mesh

Roadmap

Questions

Original Design

Description:

Explore the scalable deployment of machine learning models using KServe in this conference talk. Learn about the Multi-Model Serving solution designed to address limitations in the 'one model, one service' paradigm, including resource constraints, pod limitations, and IP address restrictions. Discover how KServe enables efficient GPU utilization for multiple models, and gain insights into its components, standard inference protocols, and performance benchmarks. Understand the evolution from KFServing to KServe, the challenges in model development, and the roadmap for future improvements. Dive into the design of Multi-Model Serving and its implementation across different frameworks, showcasing its potential to revolutionize machine learning model deployment at scale.

Serving Machine Learning Models at Scale Using KServe

CNCF [Cloud Native Computing Foundation]

Add to list

#Conference Talks #Computer Science #DevOps #Kubernetes #Software Engineering #Scalability #Programming #Cloud Computing #Serverless Computing #Machine Learning #Model Development #Software Development #Software Testing #Performance Testing #KServe