Play all

Multi-GPU Distributed Training

Video Overview

Choosing a GPU setup

Understanding VRAM requirements in detail

Understanding Optimisation and Gradient Descent

How does the Adam optimizer work?

How the Adam optimiser affects VRAM requirements

Effect of activations, model context and batch size on VRAM

Tip for GPU setup - start with a small batch size

Reducing VRAM with LoRA and quantisation

Quality trade-offs with quantisation and LoRA

Choosing between MP, DDP or FSDP

Distributed Data Parallel

Model Parallel and Fully Sharded Data Parallel FSDP

Trade-offs with DDP and FSDP

How does DeepSpeed compare to FSDP

Using FSDP and DeepSpeed with Accelerate

Code examples for MP, DDP and FSDP

Using SSH with rented GPUs Runpod

Installation

slight detour Setting a username and email for GitHub

Basic Model Parallel MP fine-tuning script

Fine-tuning script with Distributed Data Parallel DDP

Fine-tuning script with Fully Shaded Data Parallel FSDP

Running ‘accelerate config’ for FSDP

Saving a model after FSDP fine-tuning

Quick demo of a complete FSDP LoRA training script

Quick demo of an inference script after training

Wrap up

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Dive into the world of multi-GPU fine-tuning with this comprehensive tutorial on Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP) techniques. Learn how to optimize VRAM usage, understand the intricacies of the Adam optimizer, and explore the trade-offs between various distributed training methods. Gain practical insights on choosing the right GPU setup, implementing LoRA and quantization for VRAM reduction, and utilizing tools like DeepSpeed and Accelerate. Follow along with code examples for Model Parallel, DDP, and FSDP implementations, and discover how to set up and use rented GPUs via SSH. By the end of this tutorial, you'll be equipped with the knowledge to efficiently fine-tune large language models across multiple GPUs.

Multi-GPU Fine-tuning with DDP and FSDP

Trelis Research

Add to list

#Computer Science #Deep Learning #PyTorch #Stable Diffusion #LoRA (Low-Rank Adaptation) #Machine Learning #Quantization

0:00 / 0:00