Play all

Intro - open pretrained transformer

Setup creating the cond env

Setup patch the code

Collecting train script arguments

Training script walk-through

Constructing a dummy task

Building the transformer model

CUDA kernels C++ code

Preparing a dummy dataset

Training loop

Zero grad loss scaling

Forward pass through a transformer

IMPORTANT loss, scaling, mixed precision, error handling

Outro

Description:

Dive deep into the metaseq codebase behind Meta's large language model OPT-175B in this comprehensive video tutorial. Learn how to set up the code on your machine and explore key concepts of mixed precision training, including loss scaling and unscaling. Follow along as the instructor walks through the training script, constructs dummy tasks and datasets, builds the transformer model, and examines CUDA kernels in C++ code. Gain insights into the training loop, forward pass through a transformer, and crucial aspects of loss handling, scaling, and mixed precision. Perfect for those looking to understand the intricacies of large language model implementation and training.

Open Pretrained Transformer - ML Coding Series

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Machine Learning #Programming #Programming Languages #C++ #Software Development #CUDA