Play all

Serving a model for 100 customers

Video Overview

Choosing a server

Choosing software to serve an API

One-click templates

Tips on GPU selection.

Using quantisation to fit in a cheaper GPU

Vast.ai setup

Serve Mistral with vLLM and AWQ, incl. concurrent requests

Serving a function calling model

API speed tests, including concurrent

Video Recap

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn how to serve a custom Large Language Model (LLM) for over 100 customers in this comprehensive 52-minute video tutorial. Explore key topics including server selection, API software choices, and GPU optimization techniques. Discover one-click templates for easy implementation and gain insights on using quantization to maximize GPU efficiency. Follow along with a step-by-step Vast.ai setup and learn to serve Mistral with vLLM and AWQ, handling concurrent requests. Delve into function calling models and conduct API speed tests. Master the skills needed to efficiently deploy and manage LLMs for multiple users in this informative and practical guide.

Serve a Custom LLM for Over 100 Customers - GPU Selection, Quantization, and API Setup

Trelis Research

Add to list

#Programming #Web Development #API Development #Computer Science #Machine Learning #Quantization #vLLM