Play all

Introduction

Outline

Story

Potential pitfalls

What could go wrong

Scenarios

Job Interferences

Measuring Reference System Time

Experimental Setup

Model Stacking

Selected Backdrop

Question

Efficient Optimizers

Results

What goes wrong

Overheads

Conclusions

Description:

Explore a comprehensive research seminar that investigates efficient training algorithms for Transformer-based language models, focusing on the computational challenges and effectiveness of various optimization methods. Learn about three key categories of algorithms: dynamic architectures (layer stacking, layer dropping), batch selection (selective backprop, RHO loss), and efficient optimizers (Lion, Sophia). Discover the findings when pre-training BERT and T5 models with fixed computation budgets, and understand the proposed evaluation protocol using reference system time. Delve into potential pitfalls, experimental setups, and practical implications for model training efficiency. Gain insights from speakers Jean Kaddour and Oscar Key as they present their research findings, supported by publicly available code and their published paper. Master concepts including model stacking, selected backdrop, efficient optimizers, and understand the overheads and conclusions drawn from their extensive experimentation. Read more

No Train No Gain - Revisiting Efficient Training Algorithms for Transformer-based Language Models

AutoML Seminars

Add to list

#Computer Science #Machine Learning #Artificial Intelligence #Natural Language Processing (NLP) #LLM (Large Language Model) #BERT #Algorithms #Optimization Algorithms #Transformer Models #T5

0:00 / 0:00