Defining a custom learning schedule with annealing
32
How to train on completions only like openai’s default
33
Running training on Llama 3.2 1B
34
Performance evaluation after fine-tuning Llama 3.2
35
Using augmented synthetic data to improve Maths performance Advanced / Speculative!
36
Evaluating the baseline maths performance of Llama 3.2 1B
37
Fine-tuning on a training split of the lighteval/MATH dataset
38
Training on synthetic data from Llama 3.1 8B instead of the training split
39
Comparing results of training on a training split vs on synthetic Llama 3.1 8B answers
40
Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
41
Comparing all results, base vs fine-tuned on the raw dataset vs 8B synth vs 8B synth with augmentation
42
How to use augmented data if you have access to user conversations or feedback
Description:
Dive into an extensive tutorial on synthetic data generation and fine-tuning techniques for large language models like OpenAI GPT-4o and Llama 3. Learn how to create synthetic questions and answers, implement chain of thought reasoning, and augment data from various sources including documents and structured data. Explore GPU setup, data extraction from PDFs, and the process of fine-tuning both OpenAI and open-source models. Master advanced concepts such as LoRA adapters, custom learning schedules, and performance evaluation methods. Discover strategies to improve model performance in specific domains like mathematics using augmented synthetic datasets. Gain practical insights on leveraging user conversations and feedback to enhance model capabilities.
Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3