Play all

Intro

Outline

Stochastic Gradient Policy Theorem

REINFORCE Algorithm with a baseline

Performance Comparison

Temporal difference update

Actor Critic Algorithm

Advantage update

Advantage Actor Critic (A2C)

Continuous Actions

Deterministic Policy Gradient (DPG)

Description:

Explore the foundations of actor-critic methods in reinforcement learning through this comprehensive 35-minute lecture. Delve into key concepts such as the Stochastic Gradient Policy Theorem, REINFORCE Algorithm with baseline, and Temporal Difference updates. Examine performance comparisons and gain insights into advanced techniques like Advantage Actor Critic (A2C), Continuous Actions, and Deterministic Policy Gradient (DPG). Enhance your understanding of reinforcement learning algorithms and their applications in this informative session led by Pascal Poupart.

Actor Critic

Pascal Poupart

Add to list

#Computer Science #Machine Learning #Reinforcement Learning #Deep Reinforcement Learning