Главная
Study mode:
on
1
Intro
2
How do we build foundation
3
Successes of transformers in (specific) domains
4
UniT: Unified Transformer across domains
5
Can we take it one step further?
6
How does FLAVA work?
7
Stepping up the evaluation
Description:
Explore the development and capabilities of FLAVA, a unified vision and language model, in this 21-minute conference talk presented by Amanpreet Singh, Research Lead at Hugging Face. Dive into the journey towards creating a holistic universal model that excels in vision tasks, language tasks, and cross- and multi-modal vision and language tasks. Learn about the impressive performance of FLAVA on 35 diverse tasks spanning multiple modalities. Discover the evolution from domain-specific transformer models to the UniT (Unified Transformer) approach, and understand how FLAVA takes this concept even further. Gain insights into the model's architecture, functionality, and evaluation process. This presentation, recorded at Snorkel AI's 2023 Foundation Model Virtual Summit, offers valuable knowledge for those interested in state-of-the-art visio-linguistic pretraining and foundation models in artificial intelligence.

Meet FLAVA: A Unified Vision and Language Foundation Model

Snorkel AI
Add to list
0:00 / 0:00