Play all

intro

reading and exploring the dataset

exploring the bigrams in the dataset

counting bigrams in a python dictionary

counting bigrams in a 2D torch tensor "training the model"

visualizing the bigram tensor

deleting spurious S and E tokens in favor of a single . token

sampling from the model

efficiency! vectorized normalization of the rows, tensor broadcasting

loss function the negative log likelihood of the data under our model

model smoothing with fake counts

PART 2: the neural network approach: intro

creating the bigram dataset for the neural net

feeding integers into neural nets? one-hot encodings

the "neural net": one linear layer of neurons implemented with matrix multiplication

transforming neural net outputs into probabilities: the softmax

summary, preview to next steps, reference to micrograd

vectorized loss

backward and update, in PyTorch

putting everything together

note 1: one-hot encoding really just selects a row of the next Linear layer's weight matrix

note 2: model smoothing as regularization loss

sampling from the neural net

conclusion

Description:

Dive into a comprehensive video tutorial on building a bigram character-level language model, which serves as a foundation for developing more complex Transformer language models like GPT. Learn about torch.Tensor and its intricacies, efficient neural network evaluation, and the framework of language modeling, including model training, sampling, and loss evaluation. Explore topics such as dataset exploration, bigram counting, tensor visualization, model sampling, vectorized normalization, loss functions, neural network approaches, one-hot encodings, and softmax transformations. Gain practical insights through hands-on implementation, and benefit from provided resources and exercises to further enhance your understanding of language modeling concepts.

The Spelled-Out Intro to Language Modeling - Building Makemore

Andrej Karpathy

Add to list

#Computer Science #Artificial Intelligence #Natural Language Processing (NLP) #Machine Learning #Neural Networks #Mathematics #Statistics & Probability #Sampling #Transformer Models #Model Training

0:00 / 0:00