Главная
Study mode:
on
1
Intro
2
Reinforcement Learning
3
Problems of Policy Gradient
4
RL to Optimization
5
What loss to optimize?
6
New State Visitation is Difficult
7
Minorization Maximization (MM) algorithm
8
Solving KL-Penalized Problem
9
Conjugate Gradient (CG)
10
TRPO: KL-Constrained
11
TRPO Algorithm
Description:
Explore the Trust Region Policy Optimization (TRPO) algorithm in this 23-minute lecture presented by Shivam Kalra. Delve into reinforcement learning concepts, addressing policy gradient challenges and optimization techniques. Learn about the KL-penalized problem, the Minorization Maximization (MM) algorithm, and the Conjugate Gradient (CG) method. Gain insights into the TRPO algorithm, including its KL-constrained approach and implementation details. Enhance your understanding of advanced reinforcement learning techniques and their applications in solving complex optimization problems.

Trust Region Policy Optimization

Pascal Poupart
Add to list