Challenges in Reward Design for Reinforcement Learning-based Traffic Signal Control: An Investigation using a CO2 Emission Objective

Max Schumacher; Christian Medeiros Adriano; Holger Giese

doi:10.52825/scp.v4i.222

SUMO Conference Proceedings (Jun 2023)

Challenges in Reward Design for Reinforcement Learning-based Traffic Signal Control: An Investigation using a CO2 Emission Objective

Max Schumacher,
Christian Medeiros Adriano,
Holger Giese

Affiliations

Max Schumacher: ORCiD; Hasso Plattner Institute
Christian Medeiros Adriano: ORCiD; Hasso Plattner Institute
Holger Giese: Hasso Plattner Institute

DOI: https://doi.org/10.52825/scp.v4i.222
Journal volume & issue: Vol. 4

Abstract

Read online

Deep Reinforcement Learning (DRL) is a promising data-driven approach for traffic signal control, especially because DRL can learn to adapt to varying traffic demands. For that, DRL agents maximize a scalar reward by interacting with an environment. However, one needs to formulate a suitable reward, aligning agent behavior and user objectives, which is an open research problem. We investigate this problem in the context of traffic signal control with the objective of minimizing CO2 emissions at intersections. Because CO2 emissions can be affected by multiple factors outside the agent’s control, it is unclear if an emission-based metric works well as a reward, or if a proxy reward is needed. To obtain a suitable reward, we evaluate various rewards and combinations of rewards. For each reward, we train a Deep Q-Network (DQN) on homogeneous and heterogeneous traffic scenarios. We use the SUMO (Simulation of Urban MObility) simulator and its default emission model to monitor the agent’s performance on the specified rewards and CO2 emission. Our experiments show that a CO2 emission-based reward is inefficient for training a DQN, the agent’s performance is sensitive to variations in the parameters of combined rewards, and some reward formulations do not work equally well in different scenarios. Based on these results, we identify desirable reward properties that have implications to reward design for reinforcement learning-based traffic signal control.

Published in SUMO Conference Proceedings

ISSN: 2750-4425 (Online)
Publisher: TIB Open Publishing
Country of publisher: Germany
LCC subjects: Social Sciences: Transportation and communications
Website: https://www.tib-op.org/ojs/index.php/scp/index

About the journal

Abstract

Keywords