Play all

Intro -

Fine-tuning recap -

LLMs are computationally expensive -

What is Quantization? -

4 Ingredients of QLoRA -

Ingredient 1: 4-bit NormalFloat -

Ingredient 2: Double Quantization -

Ingredient 3: Paged Optimizer -

Ingredient 4: LoRA -

Bringing it all together -

Example code: Fine-tuning Mistral-7b-Instruct for YT Comments -

What's Next? -

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn how to fine-tune a large language model (LLM) using QLoRA (Quantized Low-rank Adaptation) on a single GPU in this comprehensive 37-minute video tutorial. Explore the four key ingredients of QLoRA: 4-bit NormalFloat, Double Quantization, Paged Optimizer, and LoRA. Follow along with example Python code to train a custom YouTube comment responder using Mistral-7b-Instruct. Gain insights into quantization techniques, computational efficiency, and practical implementation. Access additional resources including a series playlist, related videos, blog post, Colab notebook, GitHub repository, and Hugging Face model and dataset links for further learning and experimentation.

QLoRA - How to Fine-tune an LLM on a Single GPU with Python Code

Shaw Talebi

Add to list

#Computer Science #Machine Learning #QLoRA #Deep Learning #Stable Diffusion #LoRA (Low-Rank Adaptation) #Quantization #Fine-Tuning