Play all

Intro

Tabular Markov decision process

Prior efforts: algorithms and sample complexity results

Minimax optimal sample complexity of tabular MDP

Adding some structure: state feature map

Representing value function using linear combination of features

Rethinking Bellman equation

Reducing Bellman equation using features

Sample complexity of RL with features

Of-Policy Policy Evaluation (OPE)

OPE with function approximation

Equivalence to plug-in estimation

Minimax-optimal batch policy evaluation

Lower Bound Analysis

Episodic Reinforcement Learning

Feature space embedding of transition kernel

Regret Analysis

Exploration with Value-Targeted Regression VTAL

Description:

Explore the statistical complexities of reinforcement learning in this 47-minute lecture by Mengdi Wang from Princeton University. Delve into key theoretical questions surrounding RL, including sample complexity, regret analysis, and off-policy evaluation. Examine recent findings on minimax-optimal sample complexities for solving Markov Decision Processes, optimal off-policy evaluation through regression, and regret bounds for online RL with nonparametric model estimation. Gain insights into tabular MDPs, state feature mapping, Bellman equation reduction, and episodic reinforcement learning. Understand the importance of feature space embedding of transition kernels and exploration techniques like Value-Targeted Regression. This talk, part of the Intersections between Control, Learning and Optimization 2020 series at the Institute for Pure & Applied Mathematics, offers a comprehensive overview of recent advancements in the theoretical foundations of reinforcement learning.

On the Statistical Complexity of Reinforcement Learning

Institute for Pure & Applied Mathematics (IPAM)

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Markov Decision Processes #Education & Teaching #Online Learning #Function Approximation

0:00 / 0:00