Play all

- Introduction to AI Inference Scaling

- Video Agenda Overview

- Different Inference Approaches

- Understanding GPU Utilization

- Setting Up One-Click Templates

- Docker Image Configuration

- Building Auto-Scaling Service

- Model Configuration Settings

- Load Testing and Metrics

- Scaling Manager Implementation

- Setting Up API Endpoint

- Conclusion and Future Topics

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn to build and deploy a scalable AI inference service in this comprehensive technical video tutorial. Master essential concepts from AI inference scaling fundamentals to advanced implementation techniques. Explore different inference approaches, understand GPU utilization patterns, and set up one-click templates for streamlined deployment. Dive into Docker image configuration, develop an auto-scaling service architecture, and optimize model configuration settings for peak performance. Practice load testing methodologies, analyze key metrics, and implement a robust scaling manager. Configure API endpoints for seamless integration while following industry best practices. Gain hands-on experience with practical examples and real-world scenarios throughout each module, concluding with insights into future developments and advanced topics in AI inference services.

Building an Auto-scaling AI Inference Service - From Setup to Deployment

Trelis Research

Add to list

#Computer Science #Machine Learning #Inference #Programming #Cloud Computing #DevOps #Docker #High Performance Computing #Parallel Computing #GPU Computing #Software Engineering #System Architecture #Web Development #API Development #Model Deployment #Auto-scaling #Software Development #Software Testing #Load Testing

0:00 / 0:00