Play all

A high-level overview

tokenization

embeddings and positional encodings

encoder preprocessing splitting into subspaces

single MHA head explanation

pointwise network

causal masking MHA

source attending MHA

projecting into vocab space and loss function

decoding

Description:

Dive into a comprehensive video explanation of the groundbreaking "Attention Is All You Need" paper, which introduced the Transformer model. Learn the inner workings of the original Transformer through a detailed walkthrough using a simple machine translation example from English to German. Explore key concepts including tokenization, embeddings, positional encodings, encoder preprocessing, multi-head attention mechanisms, pointwise networks, causal masking, source attending, vocabulary space projection, loss functions, and decoding. Gain a deep understanding of this influential architecture that has revolutionized natural language processing and beyond.

Attention Is All You Need - Transformer Paper Explained

Aleksa Gordić - The AI Epiphany

Add to list

#Computer Science #Machine Learning #Deep Learning #Transformer Models #Embeddings #Positional Encoding

0:00 / 0:00