Главная
Study mode:
on
1
Intro
2
The Importance of Aligning Powerful AI Systems
3
Reinforcement Learning Example: Cliff Walking
4
Aligning TSC Agents with Rewards
5
Objective: Minimizing CO2 Emission at a Signalized Intersection
6
Reinforcement Learning Setup
7
Training the Neural Network - Deep Q-Network (DQN)
8
Motivation - Uninformative Emission Penalty
9
Informativeness and Expressiveness for Alignment
10
Findings Comparing Rewards
11
Findings - Rewards are sensitive to parameterization
12
Conclusion - Informativeness and Expressiveness are necessary
13
Technologies that helped a LOT
Description:
Watch a 21-minute conference talk exploring the challenges of designing effective reward systems for Deep Reinforcement Learning (DRL) in traffic signal control, with a focus on minimizing CO2 emissions at intersections. Dive into the complexities of training DRL agents using the SUMO (Simulation of Urban MObility) simulator, examining how different reward metrics and combinations affect agent performance. Learn why emission-based rewards prove inefficient for training Deep Q-Networks (DQN), discover the sensitivity of agent performance to reward parameter variations, and understand why certain reward formulations perform inconsistently across different scenarios. Explore key findings about reward properties that impact reinforcement learning-based traffic signal control, including the importance of informativeness and expressiveness in reward design. Follow along as presenters Christian Medeiros Adriano and Max Schumacher demonstrate practical examples like Cliff Walking, discuss the alignment of Traffic Signal Control agents with rewards, and share valuable insights about technologies that significantly enhanced their research outcomes. Read more

Challenges in Reward Design for Reinforcement Learning-based Traffic Signal Control - An Investigation Using CO2 Emission Objective

Eclipse Foundation
Add to list