Play all

Intro

Contrasting features in ViTs vs CNNs

Global vs Local receptive fields

Data matters, mr. obvious

Contrasting receptive fields

Data flow through CLS vs spatial tokens

Skip connections matter a lot in ViTs

Spatial information is preserved in ViTs

Features evolution with the amount of data

Outro

Description:

Explore a detailed analysis of the paper "Do Vision Transformers See Like Convolutional Neural Networks?" in this 35-minute video. Dive into the dissection of Vision Transformers (ViTs) and ResNets, examining the differences in learned features and the factors contributing to these disparities. Investigate the contrasts between global and local receptive fields, the impact of data quantity, and the importance of skip connections in ViTs. Gain insights into how spatial information is preserved in ViTs and observe the evolution of features as the amount of training data increases. Enhance your understanding of these advanced computer vision architectures through clear explanations and visual intuitions.

Do Vision Transformers See Like Convolutional Neural Networks - Paper Explained

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Artificial Intelligence #Computer Vision #Data Science #Data Analysis #Machine Learning #Neural Networks #Convolutional Neural Networks (CNN) #Vision Transformers