Background: Distributional RL for Policy Evaluation & Control
8
Background: Distributional Bellman Policy Evaluation Operator for Value Based Distributional RL
9
Maximal Form of Wasserstein Metric on 2 Distributions
10
Distributional Bellman Backup Operator for Control for Maximizing Expected Reward is Not a Contraction
11
Goal: Quickly and Efficiently use RL to Learn a Risk-Sensitive Policy using Conditional Value at Risk
12
Conditional Value at Risk for a Decision Policy
13
For Inspiration, look to Sample Efficient Learning for Policies that Optimize Expected Reward
14
Optimimism Under Uncertainty for Standard RL: Use Concentration Inequalities
15
Suggests a Path for Sample Efficient Risk Sensitive RL
16
Use DKW Concentration Inequality to Quantify Uncertainty over Distribution
17
Creating an Optimistic Estimate of Distribution of Returns
18
Optimism Operator Over CDF of Returns
19
Optimistic Operator for Policy Evaluation Yields Optimistic Estimate
20
Concerns about Optimistic Risk Sensitive RL
21
Optimistic Exploration for Risk Sensitive RL in Continuous Spaces
22
Recall Optimistic Operator for Distribution of Returns for Discrete State Spaces, Uses Counts
23
Optimistic Operator for Distribution of Returns for Continuous State Spaces, Uses Pseudo-Counts
24
Simulation Experiments
25
Baseline Algorithms
26
Simulation Domains
27
Machine Replacement, Risk level a = 0.25
28
HIV Treatment
29
Blood Glucose Simulator, Adult #5
30
Blood Glucose Simulator, 3 Patients
31
A Sidenote on Safer Exploration: Faster Learning also Reduces # of Bad Events During Learning
32
Many Interesting Open Directions
33
Optimisim for Conservatism: Fast RL for Learning Conditional Value at Risk Policies
Description:
Explore a comprehensive lecture on structural risk minimization in reinforcement learning delivered by Emma Brunskill from Stanford University at the Institute for Advanced Study. Delve into the importance of risk-sensitive control, distributional reinforcement learning, and the application of conditional value at risk for decision policies. Examine optimism under uncertainty techniques, concentration inequalities, and their implications for sample-efficient risk-sensitive reinforcement learning. Investigate optimistic exploration methods for both discrete and continuous state spaces, and review simulation experiments across various domains including machine replacement, HIV treatment, and blood glucose management. Gain insights into safer exploration strategies and discover potential future research directions in this cutting-edge field of artificial intelligence and machine learning.
Towards Structural Risk Minimization for RL - Emma Brunskill