Play all

Introduction

Recent breakthroughs

What is RL

History of RL

Example of RL

ChatGPT

Technical details

Three conceptual parts

NLP Pretraining

Supervised Finetuning

Reward Model Training

Input and Output Pairs

Reward Model

KL Divergence

Scaling Factor

RL Optimizer

PPO

Conceptual Questions

Prompts and Responses

anthropics

blenderbot

thumbs up and thumbs down

chatGPT example

chatGPT vsanthropic

Open areas of investigation

Wrap up

Q A

Open Source Community

Reinforcement Learning from Email

Paper Release

Description:

Explore the fundamentals of Reinforcement Learning from Human Feedback (RLHF) and its application in cutting-edge AI tools like ChatGPT in this comprehensive one-hour talk. Delve into the interconnected machine learning models, covering essential concepts in Natural Language Processing and Reinforcement Learning. Gain insights into the three main components of RLHF: NLP pretraining, supervised fine-tuning, and reward model training. Examine technical details such as input-output pairs, KL divergence, and the PPO algorithm. Discover real-world examples, compare different AI models, and explore open questions in the field. Access additional resources, including a detailed blogpost, an in-depth RL course, and presentation slides. Join speaker Nathan Lambert, a Research Scientist at HuggingFace with a PhD from UC Berkeley, as he shares his expertise and concludes with a Q&A session on the future of RLHF and its impact on AI development.

Reinforcement Learning from Human Feedback - From Zero to ChatGPT

HuggingFace

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Artificial Intelligence #Natural Language Processing (NLP) #LLM (Large Language Model) #ChatGPT #KL Divergence

0:00 / 0:00