Главная
Study mode:
on
1
Intro
2
Bandit Problem
3
Our focus: beyond linearity and concavity
4
Problem li the Stochastic Bandit Eigenvector Problem
5
Some related work
6
Information theoretical understanding
7
Beyond cubic dimension dependence
8
Our methodnoisy power method
9
Problem i Stochastic Low-rank linear reward
10
Our algorithm: noisy subspace iteration
11
Regret comparisons: quadratic reward
12
Higher-order problems
13
Problem : Symmetric High-order Polynomial bandit
14
Problem IV: Asymmetric High-order Polynomial bandit
15
Lower bound: Optimal dependence on a
16
Overall Regret Comparisons
17
Extension to RL in simulator setting
18
Conclusions We find optimal regret for different types of reward function
19
Future directions
Description:
Explore a 31-minute lecture by Qi Lei from Princeton University on optimal gradient-based algorithms for non-concave bandit optimization. Delve into advanced topics including the stochastic bandit eigenvector problem, noisy power methods, and stochastic low-rank linear rewards. Examine information theoretical understanding, regret comparisons for quadratic rewards, and higher-order polynomial bandit problems. Learn about optimal regret for different types of reward functions and potential extensions to reinforcement learning in simulator settings. Gain insights into cutting-edge research in sampling algorithms and geometries on probability distributions presented at the Simons Institute.

Optimal Gradient-Based Algorithms for Non-Concave Bandit Optimization

Simons Institute
Add to list