counting bigrams in a 2D torch tensor "training the model"
6
visualizing the bigram tensor
7
deleting spurious S and E tokens in favor of a single . token
8
sampling from the model
9
efficiency! vectorized normalization of the rows, tensor broadcasting
10
loss function the negative log likelihood of the data under our model
11
model smoothing with fake counts
12
PART 2: the neural network approach: intro
13
creating the bigram dataset for the neural net
14
feeding integers into neural nets? one-hot encodings
15
the "neural net": one linear layer of neurons implemented with matrix multiplication
16
transforming neural net outputs into probabilities: the softmax
17
summary, preview to next steps, reference to micrograd
18
vectorized loss
19
backward and update, in PyTorch
20
putting everything together
21
note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix
22
note 2: model smoothing as regularization loss
23
sampling from the neural net
24
conclusion
Description:
Dive into a comprehensive video tutorial on building a bigram character-level language model, which serves as a foundation for developing more complex Transformer language models like GPT. Learn about torch.Tensor and its intricacies, efficient neural network evaluation, and the framework of language modeling, including model training, sampling, and loss evaluation. Explore topics such as dataset exploration, bigram counting, tensor visualization, model sampling, vectorized normalization, loss functions, neural network approaches, one-hot encodings, and softmax transformations. Gain practical insights through hands-on implementation, and benefit from provided resources and exercises to further enhance your understanding of language modeling concepts.
The Spelled-Out Intro to Language Modeling - Building Makemore