Play all

Intro

The Basic Bandit Game

Bandits are Super Simple MDP

The Regret

Adversarial Protocol

Algorithm Design Principle: Exponential Weights

Exp3: Abridged Analysis

Exp3: Analysis

Upgrades

Warm-up: Explore-Then-Commit

Algorithm Design Principle: OFU

UCB Illustration

UCB: Analysis

Algorithm Design Principle: Probability Matching

Thompson Sampling: Overview

Thompson Sampling: Upper Bound

Thompson Sampling: Proof Outline

Best of Both Worlds

Two Settings

Algorithm Design Principle: Action Elimination

Successive Elimination Analysis

Bonus: Linear Contextual Bandits

Algorithm Design Principle: Optimism

Review

Description:

Delve into the intricacies of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Explore fundamental concepts such as the basic bandit game, regret analysis, and adversarial protocols. Learn about key algorithm design principles, including exponential weights, optimism in the face of uncertainty, and probability matching. Examine popular algorithms like Exp3, UCB, and Thompson Sampling, along with their analyses and upper bounds. Investigate advanced topics such as best of both worlds scenarios, successive elimination, and linear contextual bandits. Gain insights from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through this essential area of reinforcement learning theory.

Online Learning and Bandits - Part 2

Simons Institute

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Education & Teaching #Online Learning #Thompson Sampling

0:00 / 0:00