Главная
Study mode:
on
1
- Intro & Overview
2
- Frozen Pretrained Transformers
3
- Evaluated Tasks
4
- The Importance of Training LayerNorm
5
- Modality Transfer
6
- Network Architecture Ablation
7
- Evaluation of the Attention Mask
8
- Are FPTs Overfitting or Underfitting?
9
- Model Size Ablation
10
- Is Initialization All You Need?
11
- Full Model Training Overfits
12
- Again the Importance of Training LayerNorm
13
- Conclusions & Comments
Description:
Explore a detailed analysis of a machine learning research paper on pretrained transformers as universal computation engines in this informative video. Dive into the concept of fine-tuning large-scale pretrained models for cross-domain transfer, including from language to vision tasks. Learn about Frozen Pretrained Transformers (FPTs) and their ability to generalize to various sequence classification tasks with minimal fine-tuning. Examine the importance of training LayerNorm, modality transfer, network architecture ablations, and model size considerations. Gain insights into the paper's findings on the superiority of language modeling as a pre-training task for cross-domain transfer and the potential of FPTs to match fully trained transformers in zero-shot generalization.

Pretrained Transformers as Universal Computation Engines

Yannic Kilcher
Add to list
0:00 / 0:00