Description:

Explore a conference talk on SLA-aware machine learning inference serving on serverless computing platforms. Delve into the challenges of serving machine learning inference workloads in production environments and the complexities of meeting SLA requirements while optimizing infrastructure costs. Learn about MLProxy, an adaptive reverse proxy designed to support efficient machine learning serving workloads on serverless systems. Discover how MLProxy utilizes adaptive batching to ensure SLA compliance and optimize serverless costs. Examine the results of rigorous experiments conducted on Knative, demonstrating MLProxy's ability to significantly reduce serverless deployment costs and SLA violations across various model serving frameworks.

SLA-Aware Machine Learning Inference Serving on Serverless Computing Platforms

MLOps World: Machine Learning in Production

Add to list

#Computer Science #Machine Learning #Programming #Cloud Computing #DevOps #Kubernetes #Knative #MLOps #Serverless Computing

0:00 / 0:00

SLA-Aware Machine Learning Inference Serving on Serverless Computing Platforms

SLA Aware Machine Learning Inference Serving on Serverless Computing Platforms