Главная
Study mode:
on
1
Gradient policy optimization
2
Recall Policy Gradient
3
Trust region method
4
Trust region for policies
5
Kullback-Leibler Divergence
6
Reformulation
7
Derivation (continued)
8
Trust Region Policy Optimization (TRPO) TRPOO Initialize sa to anything Loop forever (for each episode)
9
Constrained Optimization
10
Simpler Objective
11
Proximal Policy Optimization (PPO)
12
Empirical Results
13
Illustration
Description:
Explore trust region methods and proximal policy optimization in this 22-minute video lecture from the CS885 course at the University of Waterloo. Delve into gradient policy optimization, Kullback-Leibler Divergence, and the Trust Region Policy Optimization (TRPO) algorithm. Learn about constrained optimization and the simplified objective of Proximal Policy Optimization (PPO). Examine empirical results and illustrations to reinforce your understanding of these advanced reinforcement learning concepts. Access accompanying slides from the course website for a comprehensive learning experience.

Trust Region & Proximal Policy Optimization

Pascal Poupart
Add to list
0:00 / 0:00