Play all

Intro

Background and previous work

Self-attention

Absolute Position Encoding and Relative Position Encoding (RPE)

RPE in Transformer-XL

Bias and Contextual Mode

A Piecewise Index Function

2D Relative Position Calculation

Experiments

Implementation details

Directed vs. Undirected Bias vs. Contextual

Shared v.s. Unshared

Piecewise v.s. Clip

Number of buckets

Component-wise analysis

Complexity Analysis

Visualization

Conclusion

Description:

Explore the advancements in Relative Position Encoding (RPE) for Vision Transformers in this 32-minute lecture from the University of Central Florida's CAP6412 2022 series. Delve into the background of self-attention mechanisms and position encoding techniques, focusing on the evolution from absolute to relative position encoding. Examine the improvements made to RPE, including the introduction of bias and contextual modes, a piecewise index function, and 2D relative position calculation. Analyze experimental results comparing directed vs. undirected approaches, shared vs. unshared implementations, and the impact of different numbers of buckets. Gain insights into component-wise analysis, complexity considerations, and visualizations that demonstrate the effectiveness of these enhancements. Conclude with a comprehensive understanding of how these innovations contribute to the performance of Vision Transformers in various computer vision tasks.

Rethinking and Improving Relative Position Encoding for Vision Transformer - Lecture 23

University of Central Florida

Add to list

#Computer Science #Artificial Intelligence #Computer Vision #Machine Learning #Deep Learning #Neural Networks #Image Processing #Self-Attention #Vision Transformers

0:00 / 0:00