Главная
Study mode:
on
1
What we will cover
2
Introducing Colab
3
Word Embeddings and d_model
4
What are Attention heads?
5
What is Dropout?
6
Why batch data?
7
How to sentences into the transformer?
8
Why feed forward layers in transformer?
9
Why Repeating Encoder layers?
10
The “Encoder” Class, nn.Module, nn.Sequential
11
The “EncoderLayer” Class
12
What is Attention: Query, Key, Value vectors
13
What is Attention: Matrix Transpose in PyTorch
14
What is Attention: Scaling
15
What is Attention: Masking
16
What is Attention: Softmax
17
What is Attention: Value Tensors
18
CRUX OF VIDEO: “MultiHeadAttention” Class
19
Returning the flow back to “EncoderLayer” Class
20
Layer Normalization
21
Returning the flow back to “EncoderLayer” Class
22
Feed Forward Layers
23
Why Activation Functions?
24
Finish the Flow of Encoder
25
Conclusion & Decoder for next video
Description:
Dive into a comprehensive 50-minute video tutorial that breaks down the Transformer Encoder architecture into 100 lines of code. Learn about word embeddings, attention heads, dropout, data batching, and the intricacies of the encoder layers. Explore key concepts such as multi-head attention, layer normalization, and feed-forward networks. Gain hands-on experience with PyTorch implementations, including nn.Module and nn.Sequential. Understand the flow of data through the encoder and discover why certain components are crucial for the transformer's performance. Perfect for those looking to deepen their understanding of natural language processing and deep learning architectures.

Transformer Encoder in 100 Lines of Code

CodeEmporium
Add to list