Главная
Study mode:
on
1
Intro
2
Single-Task Model vs. Unified Model
3
Single-Task Model for Vision
4
Image Output Quantization
5
Text Input for Different Tasks
6
Model Details
7
Objective
8
Dataset and Implementations
9
Pre-training Distribution
10
Evaluation
11
GRIT requires diverse skills
12
Results
13
Semantic Segmentation
14
Depth Estimation
15
Object Detection
16
Image Inpainting
17
Segmentation based image generation
18
Summary
19
Tasks Distribution
Description:
Explore a groundbreaking unified model for AI tasks in this 49-minute talk presented by Jiasen Lu from AI2. Delve into Unified-IO, the first neural model capable of performing a wide range of tasks across computer vision, image synthesis, vision-and-language, and natural language processing. Learn how this model homogenizes diverse task inputs and outputs into token sequences, achieving broad unification. Discover the model's architecture, training objectives, dataset implementations, and pre-training distribution. Examine evaluation methods, including the GRIT benchmark, and analyze results across various tasks such as semantic segmentation, depth estimation, object detection, image inpainting, and segmentation-based image generation. Gain insights into the future of multi-modal AI models and their potential impact on the field.

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

USC Information Sciences Institute
Add to list
0:00 / 0:00