Play all

Intro

Motivations

Policy-Space Response Oracles (PSRO) [Lanctot et. al '17] • Maintains a pool of strategies for each player, and iteratively.

Motivated Example: "Deal-or-No-Deal"[1]

Example: Bach or Stravinsky

PSRO on games beyond purely adversarial domains (no search)

Extending AlphaZero to Large Imperfect Information

MCTS in PSRO: A Bayesian Interpretation

Description:

Explore a DS4DM Coffee Talk presentation on combining tree-search, generative models, and Nash bargaining concepts in game-theoretic reinforcement learning. Delve into the augmentation of Policy-Space Response Oracles (PSRO) with a novel search procedure using generative sampling of world states and new meta-strategy solvers based on the Nash bargaining solution. Examine the evaluation of PSRO's ability to compute approximate Nash equilibrium and its performance in negotiation games like Colored Trails and Deal or No Deal. Learn about behavioral studies involving human participants negotiating with AI agents, and discover how search with generative modeling enhances policy strength, enables online Bayesian co-player prediction, and produces agents capable of achieving comparable social welfare in human-AI negotiations.

Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning

GERAD Research Center

Add to list

#Social Sciences #Economics #Game Theory #Computer Science #Machine Learning #Reinforcement Learning #Artificial Intelligence #Generative AI #Generative Models #AlphaZero #Monte Carlo Tree Search