Главная
Study mode:
on
1
Intro
2
Motivation
3
Challenges
4
Algorithm
5
Training
6
Video Representation
7
Scoring function
8
Optimization - Updating Rules
9
Exemplar queries
10
Test on Unseen Queries
11
Qualitative results
12
Sentence Encoder
13
Spatial Attention Network • Which regions of the frames to look?
14
Temporal Attention Model
15
Inference Module
16
Experiments
17
Limitations
18
What is an Inaccuracy?
19
Formulation
20
Detection By Reconstruction
21
Visual Features
22
Inaccuracy Detection
23
Correction
24
Last two chapters
25
How about the opposite problem?
26
Problem Definition
27
Proposed Approach - Generator Block Diagram
28
Text Encoding
29
Start and End Distributions
30
Latent Path Construction
31
Conditional BatchNormalization (CBN)
32
Frame Generation
33
UpPooling Block Details
34
Proposed Approach - Discriminator
35
Loss Function - Generator
36
Hinge GAN-Loss on Discriminator
37
Evaluation Metrics
38
A2D Quantitative Results
39
A2D Results
40
Robotic Results
41
Dissertation Summary
42
Future Work
Description:
Explore video content understanding using text in this 44-minute lecture by Amir Mazaheri from the University of Central Florida. Delve into the challenges, algorithms, and training methods for video representation and scoring functions. Learn about exemplar queries, sentence encoding, spatial attention networks, and temporal attention models. Examine inaccuracy detection and correction techniques, as well as the opposite problem of generating video from text. Discover the proposed approach using generator and discriminator block diagrams, conditional batch normalization, and frame generation. Analyze evaluation metrics, quantitative results, and potential future work in this comprehensive overview of video content analysis and generation techniques.

Video Content Understanding Using Text

University of Central Florida
Add to list
0:00 / 0:00