Core Concepts: Return and Discount → The Return Gt is the total discounted reward from time-stept
16
Core Concepts: Value Function(s)
17
Core Concepts: Policies
18
Core Concepts: Markov Assumption
19
Core Concepts: Markov Decision Process
20
Model-based: Dynamic Programming
21
Model-based Reinforcement Learning
22
Bellman equation
23
Policy evaluation example
24
Generalized Policy Iteration
25
GridWorlds: Sokoban
26
The rest of the iceberg
27
Continuous action/state spaces
28
Exploration vs Exploitation
29
Credit Assignment
30
Sparse, noisy and delayed rewards
31
Reward hacking
32
Model-free: Reinforcement Learning
33
Monte Carlo evaluation
34
Temporal difference evaluation
35
Q-learning: Tabular setting
36
OpenAl gym
37
DeepMind Lab
38
Part Two: Deep Reinforcement Learning
39
Value function approximation
40
Policy Gradients: Baseline and Advantage
41
Policy Gradients: Actor-Critic for Starcraft 2
42
Policy Gradients: PPO for DotA
43
Policy Gradients: PPO for robotics
44
Policy Gradients: Sonic Retro Contest
45
Big picture view of the main algorithms
46
More RL applications
Description:
Dive into the world of Reinforcement Learning (RL) with this comprehensive talk by Ben Duffy. Explore the evolution of sequential decision making, from achieving superhuman performance in complex board games to solving 2D Atari and 3D games like Doom, Quake, and StarCraft. Gain insights into the pursuit of creating artificial general intelligence and understand the main breakthroughs, paradigms, formulations, and obstacles within RL. Learn about the agent-environment loop, core concepts such as state, reward, value functions, and policies, and delve into model-based and model-free RL approaches. Discover applications in robotics, language grounding, and multi-agent collaboration. Examine deep reinforcement learning techniques, including value function approximation and policy gradients, and their applications in various domains. Get up to speed with the current state of the field and its future directions in this informative one-hour lecture.