Play all

Intro

Self-Rewarding Language Architecture

Fine-Tuning Scripts

Data for Fine-Tuning

Supervised Fine-Tuning Script

High Lora Alpha and Quantization

Evaluation Fine-Tuning Data

Generating New Prompts

Live Demo of Prompt Gen

Generating Responses

Generating Scores

Config, Compute, and Cost

Analyzing Scores

Live Run of DPO

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Learn how to fine-tune Mistral 7B with Meta's Self-Rewarding Language Model in this comprehensive technical tutorial video. Explore the self-rewarding language architecture, understand the fine-tuning process using LoRA, and follow along with practical demonstrations of prompt generation and scoring. Master the implementation details including supervised fine-tuning scripts, data preparation, evaluation methods, and configuration settings. Watch live demonstrations of prompt generation and DPO (Direct Preference Optimization) runs while gaining insights into compute requirements and cost considerations. Access provided code repositories, datasets, and additional resources to implement self-rewarding language models in your own projects. Connect with the Oxen community through Discord and join their Arxiv Dives series for more in-depth AI discussions.

Fine-Tuning Self-Rewarding Language Models with Mistral 7B

Oxen

Add to list

#Computer Science #Machine Learning #Fine-Tuning #Artificial Intelligence #Neural Networks #Natural Language Processing (NLP) #Prompt Engineering #Deep Learning #Stable Diffusion #LoRA (Low-Rank Adaptation) #Quantization #Model Evaluation #Language Models #Mistral 7B