Главная
Study mode:
on
1
Intro
2
Self-Rewarding Language Architecture
3
Fine-Tuning Scripts
4
Data for Fine-Tuning
5
Supervised Fine-Tuning Script
6
High Lora Alpha and Quantization
7
Evaluation Fine-Tuning Data
8
Generating New Prompts
9
Live Demo of Prompt Gen
10
Generating Responses
11
Generating Scores
12
Config, Compute, and Cost
13
Analyzing Scores
14
Live Run of DPO
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn how to fine-tune Mistral 7B with Meta's Self-Rewarding Language Model in this comprehensive technical tutorial video. Explore the self-rewarding language architecture, understand the fine-tuning process using LoRA, and follow along with practical demonstrations of prompt generation and scoring. Master the implementation details including supervised fine-tuning scripts, data preparation, evaluation methods, and configuration settings. Watch live demonstrations of prompt generation and DPO (Direct Preference Optimization) runs while gaining insights into compute requirements and cost considerations. Access provided code repositories, datasets, and additional resources to implement self-rewarding language models in your own projects. Connect with the Oxen community through Discord and join their Arxiv Dives series for more in-depth AI discussions.

Fine-Tuning Self-Rewarding Language Models with Mistral 7B

Oxen
Add to list