Главная
Study mode:
on
1
How to generate synthetic data for fine-tuning
2
Video Overview fine-tune OpenAI or Llama 3
3
: Synthetic Question Generation
4
Synthetic Answer Generation
5
Why chain of thought is important in Synthetic Data
6
Augmented Synthetic Data
7
Generating Synthetic Data from Documents
8
Synthetic Data from Structured Data
9
Generating data from user conversations
10
GPU and notebook setup for Notebooks
11
OpenAI Notebook: Data Generation and Fine-tuning
12
Data extraction from pdfs
13
Synthetic Data Generation for GPT-4o-mini fine-tuning
14
Generating synthetic questions using structured outputs
15
Generating synthetic answers
16
Saving data in jsonl format for OpenAI fine-tuning
17
How to fine-tune an openai model on a synthetic dataset
18
Using an LLM as a judge for evaluation
19
Evaluation of gpt-4o-mini versus fine-tuned model
20
How to increase and improve the training data
21
Fine-tuning Open Source Models like Llama 3
22
Pushing a synthetic dataset to HuggingFace
23
Loading a model with transformers or Unsloth
24
Setting generation parameters incl. temperature and top p
25
Batch generation with transformers or unsloth, incl. padding and chat templating
26
Llama 3.2 8B model performance before fine-tuning
27
Fine-tuning on synthetic data with unsloth or transformers
28
LoRA adapter setup, rescaled LoRa, choice of rank and alpha
29
Dataset preparation for fine-tuning, incl. prompt formatting
30
SFTTrainer trainer setup incl. epochs, batch size, gradient accumulation
31
Defining a custom learning schedule with annealing
32
How to train on completions only like openai’s default
33
Running training on Llama 3.2 1B
34
Performance evaluation after fine-tuning Llama 3.2
35
Using augmented synthetic data to improve Maths performance Advanced / Speculative!
36
Evaluating the baseline maths performance of Llama 3.2 1B
37
Fine-tuning on a training split of the lighteval/MATH dataset
38
Training on synthetic data from Llama 3.1 8B instead of the training split
39
Comparing results of training on a training split vs on synthetic Llama 3.1 8B answers
40
Training on an augmented synthetic dataset generated with Llama 3.1 8B and ground truth answers
41
Comparing all results, base vs fine-tuned on the raw dataset vs 8B synth vs 8B synth with augmentation
42
How to use augmented data if you have access to user conversations or feedback
Description:
Dive into an extensive tutorial on synthetic data generation and fine-tuning techniques for large language models like OpenAI GPT-4o and Llama 3. Learn how to create synthetic questions and answers, implement chain of thought reasoning, and augment data from various sources including documents and structured data. Explore GPU setup, data extraction from PDFs, and the process of fine-tuning both OpenAI and open-source models. Master advanced concepts such as LoRA adapters, custom learning schedules, and performance evaluation methods. Discover strategies to improve model performance in specific domains like mathematics using augmented synthetic datasets. Gain practical insights on leveraging user conversations and feedback to enhance model capabilities.

Synthetic Data Generation and Fine-tuning for OpenAI GPT-4 or Llama 3

Trelis Research
Add to list