Play all

- Intro & Overview

- Frozen Pretrained Transformers

- Evaluated Tasks

- The Importance of Training LayerNorm

- Modality Transfer

- Network Architecture Ablation

- Evaluation of the Attention Mask

- Are FPTs Overfitting or Underfitting?

- Model Size Ablation

- Is Initialization All You Need?

- Full Model Training Overfits

- Again the Importance of Training LayerNorm

- Conclusions & Comments

Description:

Explore a detailed analysis of a machine learning research paper on pretrained transformers as universal computation engines in this informative video. Dive into the concept of fine-tuning large-scale pretrained models for cross-domain transfer, including from language to vision tasks. Learn about Frozen Pretrained Transformers (FPTs) and their ability to generalize to various sequence classification tasks with minimal fine-tuning. Examine the importance of training LayerNorm, modality transfer, network architecture ablations, and model size considerations. Gain insights into the paper's findings on the superiority of language modeling as a pre-training task for cross-domain transfer and the potential of FPTs to match fully trained transformers in zero-shot generalization.

Pretrained Transformers as Universal Computation Engines

Yannic Kilcher

Add to list

#Computer Science #Machine Learning #Transfer Learning #Transformers #Fine-Tuning

0:00 / 0:00