Play all

Intro

Single-Task Model vs. Unified Model

Single-Task Model for Vision

Image Output Quantization

Text Input for Different Tasks

Model Details

Objective

Dataset and Implementations

Pre-training Distribution

Evaluation

GRIT requires diverse skills

Results

Semantic Segmentation

Depth Estimation

Object Detection

Image Inpainting

Segmentation based image generation

Summary

Tasks Distribution

Description:

Explore a groundbreaking unified model for AI tasks in this 49-minute talk presented by Jiasen Lu from AI2. Delve into Unified-IO, the first neural model capable of performing a wide range of tasks across computer vision, image synthesis, vision-and-language, and natural language processing. Learn how this model homogenizes diverse task inputs and outputs into token sequences, achieving broad unification. Discover the model's architecture, training objectives, dataset implementations, and pre-training distribution. Examine evaluation methods, including the GRIT benchmark, and analyze results across various tasks such as semantic segmentation, depth estimation, object detection, image inpainting, and segmentation-based image generation. Gain insights into the future of multi-modal AI models and their potential impact on the field.

Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks

USC Information Sciences Institute

Add to list

#Computer Science #Artificial Intelligence #Computer Vision #Deep Learning #Image Synthesis #Image Segmentation #Vision-Language Models

0:00 / 0:00