Play all

Intro

Visual Question Answering

Task Breakdown

Architecture Overview

Question Parsing

Program Execution

Training

Quantitative results on CLEVR

CLEVR-Humans & Results

New Scenes: Minecraft

Summary

Description:

Explore the innovative approach to Visual Question Answering (VQA) that disentangles reasoning from vision and language understanding in this 27-minute lecture from the University of Central Florida. Delve into the task breakdown, architecture overview, and key components such as question parsing and program execution. Examine quantitative results on CLEVR and CLEVR-Humans datasets, and discover how this neural-symbolic method extends to new scenes like Minecraft. Gain insights into the future of AI systems that can effectively combine reasoning with visual and linguistic comprehension.

Neural-Symbolic VQA - Disentangling Reasoning from Vision and Language Understanding

University of Central Florida

Add to list

#Computer Science #Artificial Intelligence #Computer Vision #Humanities #History #Pop Culture #Video Games #Minecraft

0:00 / 0:00