Play all

Introduction

Outline

What is Model Serving

Deploying Models as Microservices

Project Kserv

Pod Per Model Paradigm

Model Mesh

Model Mesh Architecture

Model Mesh Architecture Overview

Monitoring Model Mesh

Prometheus

ModelMesh Dashboard

Cache Miss Rate

Serving Runtimes

Example

Custom Runtime

Inference Service

Demo

Contact Information

Questions

Description:

Explore the scalable deployment of AI models on Kubernetes using ModelMesh, the multi-model serving backend for KServe, in this conference talk. Learn how to overcome resource limitations and efficiently manage numerous models at scale. Discover ModelMesh's distributed LRU cache for intelligent model loading and unloading, as well as its routing capabilities for balancing inference requests. Gain insights into the latest major release (v0.10) and its integration with KServe. Understand the advantages of ModelMesh's small control-plane footprint and its ability to host multiple models while maximizing cluster resources and minimizing costs. Explore newly-supported model runtimes like TorchServe and the capability for runtime-sharing across namespaces. Dive into the ModelMesh architecture, monitoring techniques using Prometheus, and practical examples of custom runtimes and inference services through a live demonstration.

ModelMesh: Scalable AI Model Serving on Kubernetes

Linux Foundation

Add to list

#Programming #Programming Languages #PHP #Laravel #Web Development #Computer Science #DevOps #Kubernetes #Cloud Computing #Microservices #Prometheus #Software Engineering #Scalability #Machine Learning #KServe #Distributed Systems #Distributed Caching