Play all

Intro

RL in User-Facing/Interactive Systems nature RL has found tremendous success with deep models

Some Challenges in User-facing RL (RecSys) Scale • Number of users (multi-user/MDPs) & actions combinatoriales, slates Idiosyncratic nature of actions

I. Stochastic Action Sets

SAS-MDPs: Constructing an MDP

SAS-MDPs: Solving Extended MDP

II. User-learning over Long Horizons Evidence of (very) slow user leaming and adaptation

Advantage Amplification Temporal aggregation leg, fixed actions can help amplify advantages

Advantage Amplification Temporal aggregation (eg, fixed actions) can help amplify advantages

Advantage Amplification Key points

An MDP/RL Formulation Objective: max cumulative user engagement' over session

The Problem: Item Interaction The presence of some items on the slate impacts user response hence value of others

User Choice: Assumptions Two key, but reasonable, assumptions

Full Q-Learning Decomposition still holds, standard Q-leaming update

Slate Optimization: Tractable Standard formulation: Fractional moved-integer program

Slate Optimization: Tractable Standard formulation: Fractional mixed-integer program

Synthetic Experiments Synthetic environment

Robustness to User Choice Models Change user choice model to cascade Joachims 2002

Description:

Explore the challenges of applying reinforcement learning to recommender systems in this 52-minute lecture by Craig Boutilier from Google and the University of Toronto. Delve into key issues such as scaling for multiple users and actions, handling stochastic action sets, and addressing user learning over long horizons. Examine the MDP/RL formulation for maximizing user engagement, and investigate item interactions on recommendation slates. Learn about user choice assumptions, Q-learning decomposition, and slate optimization techniques. Analyze synthetic experiments and the robustness of models to different user choice behaviors, including the cascade model.

Reinforcement Learning in Recommender Systems - Some Challenges

Simons Institute

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Recommender Systems #Mathematics #Mixed-Integer Programming #Q-learning

0:00 / 0:00