Play all

Intro

Positioning this Tutorial

Working Definitions

Full Information Online Learning

Setup

OCO Problem

Design Principle

Online Gradient Descent (OGD) Algorithm

Online Gradient Descent Result

Proof of OGD regret bound (ctd)

OGD Discussion

From Learning Parameters to Picking Actions

Let's apply what we know

Exponential Weigths / Hedge Algorithm Algorithm: Exponential Weights (EW)

EW Analysis Applying Hoeding's Lemma to the loss of each round gives

Summary so far Balancing act "model complexity vs "overfitting

FTRL/MD "sneak peek"

FTRL/MD sneak peak performance Algorithm: Follow the Regularised Leader (FTRL)

Quadratic Losses

Curvature assumptions

ONS Algorithm

ONS Performance

ONS Discussion

Offline Optimisation

Online to Batch Assumption: stochastic setting

Computing Saddle Points

Application 3: Saddle Point Algorithm Algorithm: approximate saddle point solver

Application 3: Saddle Point Analysis

Conclusion

Description:

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only! Grab it Explore the fundamentals of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Delve into key concepts such as full information online learning, online gradient descent, exponential weights algorithm, and follow the regularized leader. Examine the balance between model complexity and overfitting, and discover applications in offline optimization and computing saddle points. Learn from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through working definitions, design principles, and performance analyses of various algorithms. Gain valuable insights into the theoretical foundations of reinforcement learning and their practical implications in this hour-long presentation.

Online Learning and Bandits - Part 1

Simons Institute

Add to list

#Education & Teaching #Online Learning #Computer Science #Machine Learning #Reinforcement Learning

0:00 / 0:00