Главная
Study mode:
on
1
Intro
2
Sequential Decision Making Under Uncertainty
3
Learning to Make Good Sequences of Decisions Under Uncertainty → 1980s Reinforcement Learning
4
Background: Markov Decision Process Value Function
5
Background: Reinforcement Learning
6
Counterfactual / Batch Off Policy Reinforcement Learning
7
Need for Generalization
8
Growing Interest in Causal Inference & ML
9
Batch / Counterfactual Policy Optimization: Pick Policy w/Best Estimated Expected Sum of Rewards
10
Quest: Batch Policy Optimization w/ Generalization Bounds
11
Challenge: Good Error Bound Analysis
12
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Good in Class Policy
13
Off-Policy Policy Gradient with State Distribution Correction
14
Aim: Strong Generalization Guarantees on Policy Performance, Alternative: Guarantee Find Best in Class Policy
15
Example: Linear Thresholding Policies Starting HIV treatment as soon as
16
Use an Advantage Decomposition
17
Use a Doubly Robust Advantage Decomposition
18
Quest for Batch Policy Optimization with Generalization Guarantees
19
Techniques to Minimize & Understand Data Needed to Learn to Make Good Decisions
Description:
Explore sequential decision-making under uncertainty in this 41-minute lecture by Emma Brunskill from Stanford University. Delve into the emerging challenges of deep learning, focusing on counterfactual and batch reinforcement learning. Learn about Markov Decision Processes, value functions, and the growing interest in causal inference and machine learning. Examine batch policy optimization with generalization bounds, off-policy policy gradient techniques, and methods to minimize data needed for effective decision-making. Gain insights into linear thresholding policies, advantage decomposition, and doubly robust advantage decomposition in the context of real-world applications like HIV treatment initiation.

Better Learning from the Past - Counterfactual - Batch RL

Simons Institute
Add to list