Play all

Intro

Questions of interest

Main challenges

MDP Preliminaries

Policy parameterizations

Policy gradient algorithm

Policy gradient example: Softmax parameterization

Entropy regularization

Convergence of Entropy regularized PG

A natural solution

Proof ideas

Restricted parameterizations

A closer look at Natural Policy Gradient • NPG performs the update

Assumptions on policies

Extension to finite samples

Looking ahead

Description:

Explore the intricacies of policy gradient methods in Markov Decision Processes through this 55-minute lecture by Alekh Agarwal from Microsoft Research Redmond. Delve into optimality and approximation concepts as part of the "Emerging Challenges in Deep Learning" series at the Simons Institute. Examine MDP preliminaries, policy parameterizations, and the policy gradient algorithm, with a focus on softmax parameterization and entropy regularization. Analyze the convergence of entropy-regularized PGA, natural solutions, and proof ideas. Investigate restricted parameterizations, natural policy gradient updates, policy assumptions, and extensions to finite samples. Gain valuable insights into this crucial area of deep learning and reinforcement learning research.

Optimality and Approximation with Policy Gradient Methods in Markov Decision Processes

Simons Institute

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Deep Learning #Markov Decision Processes #Algorithms #Approximation #Policy Gradient Methods