Главная
Study mode:
on
1
What we’re covering
2
The Problem With Human-Labeled Data
3
Super-human Agents and Synthetic Data
4
What is a Self-Rewarding Language Model
5
Skill 1. Instruction Following
6
Skill 2. LLM-as-a-Judge
7
Prompting as the Judge
8
Initialization and Datasets
9
Self-Instruction Creation
10
AI Feedback Training Data Creation AIFT
11
Iterative Training
12
Evaluation
13
Results
14
Conclusion
15
Join us!
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore a comprehensive technical lecture that delves into Meta and NYU's groundbreaking research on Self-Rewarding Language Models, focusing on eliminating the need for human-labeled data by enabling models to act as their own judges. Learn about the challenges of human-labeled data, the concept of super-human agents, and the intricate workings of self-rewarding language models through detailed explanations of instruction following and LLM-as-a-Judge capabilities. Discover the technical aspects of model initialization, dataset creation, self-instruction processes, and AI Feedback Training (AIFT) methodologies. Examine the evaluation methods and results that demonstrate the effectiveness of this innovative approach to language model training. Perfect for AI researchers, machine learning practitioners, and anyone interested in cutting-edge developments in natural language processing and artificial intelligence.

Deep Dive Into Self-Rewarding Language Models - Training Models as Their Own Judges

Oxen
Add to list
0:00 / 0:00