Play all

Intro

How do we build foundation

Successes of transformers in (specific) domains

UniT: Unified Transformer across domains

Can we take it one step further?

How does FLAVA work?

Stepping up the evaluation

Description:

Explore the development and capabilities of FLAVA, a unified vision and language model, in this 21-minute conference talk presented by Amanpreet Singh, Research Lead at Hugging Face. Dive into the journey towards creating a holistic universal model that excels in vision tasks, language tasks, and cross- and multi-modal vision and language tasks. Learn about the impressive performance of FLAVA on 35 diverse tasks spanning multiple modalities. Discover the evolution from domain-specific transformer models to the UniT (Unified Transformer) approach, and understand how FLAVA takes this concept even further. Gain insights into the model's architecture, functionality, and evaluation process. This presentation, recorded at Snorkel AI's 2023 Foundation Model Virtual Summit, offers valuable knowledge for those interested in state-of-the-art visio-linguistic pretraining and foundation models in artificial intelligence.

Meet FLAVA: A Unified Vision and Language Foundation Model

Snorkel AI

Add to list

#Computer Science #Artificial Intelligence #Multimodal AI #Computer Vision #Machine Learning #Transformers #Foundation Models #Hugging Face

0:00 / 0:00