Play all

Intro

Motivation

Challenges

Algorithm

Training

Video Representation

Scoring function

Optimization - Updating Rules

Exemplar queries

Test on Unseen Queries

Qualitative results

Sentence Encoder

Spatial Attention Network • Which regions of the frames to look?

Temporal Attention Model

Inference Module

Experiments

Limitations

What is an Inaccuracy?

Formulation

Detection By Reconstruction

Visual Features

Inaccuracy Detection

Correction

Last two chapters

How about the opposite problem?

Problem Definition

Proposed Approach - Generator Block Diagram

Text Encoding

Start and End Distributions

Latent Path Construction

Conditional BatchNormalization (CBN)

Frame Generation

UpPooling Block Details

Proposed Approach - Discriminator

Loss Function - Generator

Hinge GAN-Loss on Discriminator

Evaluation Metrics

A2D Quantitative Results

A2D Results

Robotic Results

Dissertation Summary

Future Work

Description:

Explore video content understanding using text in this 44-minute lecture by Amir Mazaheri from the University of Central Florida. Delve into the challenges, algorithms, and training methods for video representation and scoring functions. Learn about exemplar queries, sentence encoding, spatial attention networks, and temporal attention models. Examine inaccuracy detection and correction techniques, as well as the opposite problem of generating video from text. Discover the proposed approach using generator and discriminator block diagrams, conditional batch normalization, and frame generation. Analyze evaluation metrics, quantitative results, and potential future work in this comprehensive overview of video content analysis and generation techniques.

Video Content Understanding Using Text

University of Central Florida

Add to list

#Computer Science #Artificial Intelligence #Computer Vision #Machine Learning #Deep Learning #Attention Mechanisms

0:00 / 0:00