Play all

What we will cover

Introducing Colab

Word Embeddings and d_model

What are Attention heads?

What is Dropout?

Why batch data?

How to sentences into the transformer?

Why feed forward layers in transformer?

Why Repeating Encoder layers?

The “Encoder” Class, nn.Module, nn.Sequential

The “EncoderLayer” Class

What is Attention: Query, Key, Value vectors

What is Attention: Matrix Transpose in PyTorch

What is Attention: Scaling

What is Attention: Masking

What is Attention: Softmax

What is Attention: Value Tensors

CRUX OF VIDEO: “MultiHeadAttention” Class

Returning the flow back to “EncoderLayer” Class

Layer Normalization

Returning the flow back to “EncoderLayer” Class

Feed Forward Layers

Why Activation Functions?

Finish the Flow of Encoder

Conclusion & Decoder for next video

Description:

Dive into a comprehensive 50-minute video tutorial that breaks down the Transformer Encoder architecture into 100 lines of code. Learn about word embeddings, attention heads, dropout, data batching, and the intricacies of the encoder layers. Explore key concepts such as multi-head attention, layer normalization, and feed-forward networks. Gain hands-on experience with PyTorch implementations, including nn.Module and nn.Sequential. Understand the flow of data through the encoder and discover why certain components are crucial for the transformer's performance. Perfect for those looking to deepen their understanding of natural language processing and deep learning architectures.

Transformer Encoder in 100 Lines of Code

CodeEmporium

Add to list

#Computer Science #Deep Learning #Attention Mechanisms #Machine Learning #Word Embeddings #Data Science #Data Processing #Batch Processing #Batch Data Processing