Главная
Study mode:
on
1
How to pick a GPU and software for inference
2
Video Overview
3
Effect of Quantization on Quality
4
Effect of Quantization on Speed
5
Effect of GPU bandwidth relative to model size
6
Effect of de-quantization on inference speed
7
Marlin Kernels, AWQ and Neural Magic
8
Inference Software - vLLM, TGI, SGLang, NIM
9
Deploying one-click templates for inference
10
Testing inference speed for a batch size of 1 and 64
11
SGLang inference speed
12
vLLM inference speed
13
Text Generation Inference Speed
14
Nvidia NIM Inference Speed
15
Comparing vLLM, SGLang, TGI and NIM Inference Speed.
16
Comparing inference costs for A40, A6000, A100 and H100
17
Inference Setup for Llama 3.1 70B and 405B
18
Running inference on Llama 8B on A40, A6000, A100 and H100
19
Inference cost comparison for Llama 8B
20
Running inference on Llama 70B and 405B on A40, A6000, A100 and H100
21
Inference cost comparison for Llama 70B and 405B
22
OpenAI GPT4o Inference Costs versus Llama 3.1 8B, 70B, 405B
23
Final Inference Tips
24
Resources
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Dive into a comprehensive video tutorial on selecting the right GPU and inference engine for machine learning projects. Learn about the impact of quantization on model quality and speed, the relationship between GPU bandwidth and model size, and the effects of de-quantization on inference speed. Explore advanced topics like Marlin Kernels, AWQ, and Neural Magic. Compare popular inference software including vLLM, TGI, SGLang, and NIM, and discover how to deploy one-click templates for inference. Analyze detailed performance comparisons across various GPUs (A40, A6000, A100, H100) and model sizes (Llama 3.1 8B, 70B, 405B), including cost considerations. Gain insights into OpenAI GPT4 inference costs compared to Llama models. Conclude with valuable tips for optimizing inference setups and access additional resources for further learning.

How to Pick a GPU and Inference Engine for Large Language Models

Trelis Research
Add to list
0:00 / 0:00