Play all

Introduction

Material

Underlying Technology

Primary Stability

Other Parameters

Methodology

Training Curves

Summary

Intensive vs Extensive Properties

Extensive vs Intensive Properties

The Plan

Example

General Tuning

Experimental Results

BIRD

Evaluation Results

Vertical Foundation

Primarization

Theory of Everything

Description:

Explore the groundbreaking technique of tuning GPT-3 hyperparameters on a single GPU through zero-shot hyperparameter transfer in this MIT seminar. Delve into the maximal update parametrization (µP) concept, which allows narrow and wide neural networks to share optimal hyperparameters. Learn how this method enabled tuning of the 6.7 billion parameter GPT-3 version using only 7% of its pretraining compute budget. Discover the theoretical foundations behind µP's unique properties and its connection to infinite-width neural networks and Tensor Programs theory. Gain insights from Greg Yang, a Microsoft Research scientist with a distinguished academic background, as he presents findings based on his research paper. Suitable for both general machine learning practitioners and those interested in theoretical aspects of neural networks.

Tuning GPT-3 on a Single GPU via Zero-Shot Hyperparameter Transfer

Massachusetts Institute of Technology

Add to list

#Computer Science #Artificial Intelligence #Natural Language Processing (NLP) #LLM (Large Language Model) #GPT-3 #ChatGPT #Machine Learning #Zero-shot learning (ZSL) #Hyperparameter Optimization

0:00 / 0:00