Главная
Study mode:
on
1
Introduction
2
Recent breakthroughs
3
What is RL
4
History of RL
5
Example of RL
6
ChatGPT
7
Technical details
8
Three conceptual parts
9
NLP Pretraining
10
Supervised Finetuning
11
Reward Model Training
12
Input and Output Pairs
13
Reward Model
14
KL Divergence
15
Scaling Factor
16
RL Optimizer
17
PPO
18
Conceptual Questions
19
Prompts and Responses
20
anthropics
21
blenderbot
22
thumbs up and thumbs down
23
chatGPT example
24
chatGPT vsanthropic
25
Open areas of investigation
26
Wrap up
27
Q A
28
Open Source Community
29
Reinforcement Learning from Email
30
Paper Release
Description:
Explore the fundamentals of Reinforcement Learning from Human Feedback (RLHF) and its application in cutting-edge AI tools like ChatGPT in this comprehensive one-hour talk. Delve into the interconnected machine learning models, covering essential concepts in Natural Language Processing and Reinforcement Learning. Gain insights into the three main components of RLHF: NLP pretraining, supervised fine-tuning, and reward model training. Examine technical details such as input-output pairs, KL divergence, and the PPO algorithm. Discover real-world examples, compare different AI models, and explore open questions in the field. Access additional resources, including a detailed blogpost, an in-depth RL course, and presentation slides. Join speaker Nathan Lambert, a Research Scientist at HuggingFace with a PhD from UC Berkeley, as he shares his expertise and concludes with a Q&A session on the future of RLHF and its impact on AI development.

Reinforcement Learning from Human Feedback - From Zero to ChatGPT

HuggingFace
Add to list
0:00 / 0:00