Главная
Study mode:
on
1
Intro
2
The Basic Bandit Game
3
Bandits are Super Simple MDP
4
The Regret
5
Adversarial Protocol
6
Algorithm Design Principle: Exponential Weights
7
Exp3: Abridged Analysis
8
Exp3: Analysis
9
Upgrades
10
Warm-up: Explore-Then-Commit
11
Algorithm Design Principle: OFU
12
UCB Illustration
13
UCB: Analysis
14
Algorithm Design Principle: Probability Matching
15
Thompson Sampling: Overview
16
Thompson Sampling: Upper Bound
17
Thompson Sampling: Proof Outline
18
Best of Both Worlds
19
Two Settings
20
Algorithm Design Principle: Action Elimination
21
Successive Elimination Analysis
22
Bonus: Linear Contextual Bandits
23
Algorithm Design Principle: Optimism
24
Review
Description:
Delve into the intricacies of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Explore fundamental concepts such as the basic bandit game, regret analysis, and adversarial protocols. Learn about key algorithm design principles, including exponential weights, optimism in the face of uncertainty, and probability matching. Examine popular algorithms like Exp3, UCB, and Thompson Sampling, along with their analyses and upper bounds. Investigate advanced topics such as best of both worlds scenarios, successive elimination, and linear contextual bandits. Gain insights from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through this essential area of reinforcement learning theory.

Online Learning and Bandits - Part 2

Simons Institute
Add to list
0:00 / 0:00