EW Analysis Applying Hoeding's Lemma to the loss of each round gives
16
Summary so far Balancing act "model complexity vs "overfitting
17
FTRL/MD "sneak peek"
18
FTRL/MD sneak peak performance Algorithm: Follow the Regularised Leader (FTRL)
19
Quadratic Losses
20
Curvature assumptions
21
ONS Algorithm
22
ONS Performance
23
ONS Discussion
24
Offline Optimisation
25
Online to Batch Assumption: stochastic setting
26
Computing Saddle Points
27
Application 3: Saddle Point Algorithm Algorithm: approximate saddle point solver
28
Application 3: Saddle Point Analysis
29
Conclusion
Description:
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Grab it
Explore the fundamentals of online learning and bandit algorithms in this comprehensive lecture from the Theory of Reinforcement Learning Boot Camp. Delve into key concepts such as full information online learning, online gradient descent, exponential weights algorithm, and follow the regularized leader. Examine the balance between model complexity and overfitting, and discover applications in offline optimization and computing saddle points. Learn from experts Alan Malek of DeepMind and Wouter Koolen from Centrum Wiskunde & Informatica as they guide you through working definitions, design principles, and performance analyses of various algorithms. Gain valuable insights into the theoretical foundations of reinforcement learning and their practical implications in this hour-long presentation.