Play all

What we’re covering

The Problem With Human-Labeled Data

Super-human Agents and Synthetic Data

What is a Self-Rewarding Language Model

Skill 1. Instruction Following

Skill 2. LLM-as-a-Judge

Prompting as the Judge

Initialization and Datasets

Self-Instruction Creation

AI Feedback Training Data Creation AIFT

Iterative Training

Evaluation

Results

Conclusion

Join us!

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore a comprehensive technical lecture that delves into Meta and NYU's groundbreaking research on Self-Rewarding Language Models, focusing on eliminating the need for human-labeled data by enabling models to act as their own judges. Learn about the challenges of human-labeled data, the concept of super-human agents, and the intricate workings of self-rewarding language models through detailed explanations of instruction following and LLM-as-a-Judge capabilities. Discover the technical aspects of model initialization, dataset creation, self-instruction processes, and AI Feedback Training (AIFT) methodologies. Examine the evaluation methods and results that demonstrate the effectiveness of this innovative approach to language model training. Perfect for AI researchers, machine learning practitioners, and anyone interested in cutting-edge developments in natural language processing and artificial intelligence.

Deep Dive Into Self-Rewarding Language Models - Training Models as Their Own Judges

Oxen

Add to list

#Computer Science #Machine Learning #Language Models #Supervised Learning #Artificial Intelligence #Neural Networks #Natural Language Processing (NLP) #Prompt Engineering #Model Evaluation #Synthetic Data

0:00 / 0:00