Play all

Awesome song and introduction

Word Embedding

Positional Encoding

Self-Attention

Encoder and Decoder defined

Decoder Word Embedding

Decoder Positional Encoding

Transformers were designed for parallel computing

Decoder Self-Attention

Encoder-Decoder Attention

Decoding numbers into words

Decoding the second token

Extra stuff you can add to a Transformer

Description:

Dive into a comprehensive 36-minute video explanation of Transformer Neural Networks, the foundation of cutting-edge AI technologies like ChatGPT and Google Translate. Learn about word embedding, positional encoding, self-attention mechanisms, and the encoder-decoder architecture. Explore how Transformers are designed for parallel computing and understand the decoding process. Gain insights into additional components that can enhance Transformer performance. Supplementary links are provided for deeper understanding of related concepts such as backpropagation, SoftMax function, and cosine similarity.

Transformer Neural Networks, ChatGPT's Foundation, Clearly Explained

StatQuest with Josh Starmer

Add to list

#Computer Science #Deep Learning #Self-Attention #Artificial Intelligence #High Performance Computing #Parallel Computing #Machine Learning #Word Embeddings #Positional Encoding

0:00 / 0:00