Play all

Intro

MARKOV DECISION PROCESSES

ADVERSARIAL

PERFORMANCE MEASURE: RE

OUTLINE

NON-OBLIVIOUS ADVERSARI

WHAT WENT WRONG?

OBLIVIOUS ADVERSARIES

LEARNING WITH CHANGING TRANSITIONS IS HARD

PROOF CONSTRUCTION

SLOWLY CHANGING MDPS

FORMAL PROTOCOL Online learning in a fixed MDP For each round t = 1,2, ..., • Learner observes state X, EX

TEMPORAL DEPENDENCES

REGRET DECOMPOSITION

THE DRIFT TERMS

LOCAL-TO-GLOBAL

THE MDP-EXPERT ALGORITHE

GUARANTEES FOR MDP-E

BANDIT FEEDBACK

ONLINE LINEAR OPTIMIZATIO

ONLINE MIRROR DESCENT

THE ONLINE REPS ALGORITH O-REPS

GUARANTEES FOR O-REPS

COMPARISON OF GUARANTE

MDP-E WITH FUNCTION APPROXIMATION MDP-E only needs a good approximation of the action-value

O-REPS WITH UNCERTAIN MO

OUTLOOK

Description:

Explore the intricacies of online learning in Markov Decision Processes (MDPs) in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Delve into topics such as adversarial scenarios, performance measures, oblivious and non-oblivious adversaries, and the challenges of learning with changing transitions. Examine the formal protocol for online learning in fixed MDPs, temporal dependencies, and regret decomposition. Discover the MDP-Expert algorithm, its guarantees, and applications in bandit feedback scenarios. Investigate online linear optimization, mirror descent, and the Online Relative Entropy Policy Search (O-REPS) algorithm. Compare various guarantees and explore function approximation in MDP-E. Gain insights into O-REPS with uncertain models and consider future directions in this field of study.

Online Learning in Markov Decision Processes - Part 2

Simons Institute

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Markov Decision Processes #Education & Teaching #Online Learning #Function Approximation

0:00 / 0:00