Electronic Research Archive (Jun 2024)

Uniformity of markov elements in deep reinforcement learning for traffic signal control

  • Bao-Lin Ye,
  • Peng Wu,
  • Lingxi Li,
  • Weimin Wu

DOI
https://doi.org/10.3934/era.2024174
Journal volume & issue
Vol. 32, no. 6
pp. 3843 – 3866

Abstract

Read online

Traffic signal control (TSC) plays a crucial role in enhancing traffic capacity. In recent years, researchers have demonstrated improved performance by utilizing deep reinforcement learning (DRL) for optimizing TSC. However, existing DRL frameworks predominantly rely on manually crafted states, actions, and reward designs, which limit direct information exchange between the DRL agent and the environment. To overcome this challenge, we propose a novel design method that maintains consistency among states, actions, and rewards, named uniformity state-action-reward (USAR) method for TSC. The USAR method relies on: 1) Updating the action selection for the next time step using a formula based on the state perceived by the agent at the current time step, thereby encouraging rapid convergence to the optimal strategy from state perception to action; and 2) integrating the state representation with the reward function design, allowing for precise assessment of the efficacy of past action strategies based on the received feedback rewards. The consistency-preserving design method jointly optimizes the TSC strategy through the updates and feedback among the Markov elements. Furthermore, the method proposed in this paper employs a residual block into the DRL model. It introduces an additional pathway between the input and output layers to transfer feature information, thus promoting the flow of information across different network layers. To assess the effectiveness of our approach, we conducted a series of simulation experiments using the simulation of urban mobility. The USAR method, incorporating a residual block, outperformed other methods and exhibited the best performance in several evaluation metrics.

Keywords