Sequence Decision Transformer for Adaptive Traffic Signal Control

Rui Zhao; Haofeng Hu; Yun Li; Yuze Fan; Fei Gao; Zhenhai Gao

doi:10.3390/s24196202

Sensors (Sep 2024)

Sequence Decision Transformer for Adaptive Traffic Signal Control

Rui Zhao,
Haofeng Hu,
Yun Li,
Yuze Fan,
Fei Gao,
Zhenhai Gao

Affiliations

Rui Zhao: College of Automotive Engineering, Jilin University, Changchun 130025, China
Haofeng Hu: College of Automotive Engineering, Jilin University, Changchun 130025, China
Yun Li: Graduate School of Information and Science Technology, The University of Tokyo, Tokyo 113-8654, Japan
Yuze Fan: College of Automotive Engineering, Jilin University, Changchun 130025, China
Fei Gao: National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130025, China
Zhenhai Gao: National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130025, China

DOI: https://doi.org/10.3390/s24196202
Journal volume & issue: Vol. 24, no. 19
p. 6202

Abstract

Read online

Urban traffic congestion poses significant economic and environmental challenges worldwide. To mitigate these issues, Adaptive Traffic Signal Control (ATSC) has emerged as a promising solution. Recent advancements in deep reinforcement learning (DRL) have further enhanced ATSC’s capabilities. This paper introduces a novel DRL-based ATSC approach named the Sequence Decision Transformer (SDT), employing DRL enhanced with attention mechanisms and leveraging the robust capabilities of sequence decision models, akin to those used in advanced natural language processing, adapted here to tackle the complexities of urban traffic management. Firstly, the ATSC problem is modeled as a Markov Decision Process (MDP), with the observation space, action space, and reward function carefully defined. Subsequently, we propose SDT, specifically tailored to solve the MDP problem. The SDT model uses a transformer-based architecture with an encoder and decoder in an actor–critic structure. The encoder processes observations and outputs, both encoded data for the decoder, and value estimates for parameter updates. The decoder, as the policy network, outputs the agent’s actions. Proximal Policy Optimization (PPO) is used to update the policy network based on historical data, enhancing decision-making in ATSC. This approach significantly reduces training times, effectively manages larger observation spaces, captures dynamic changes in traffic conditions more accurately, and enhances traffic throughput. Finally, the SDT model is trained and evaluated in synthetic scenarios by comparing the number of vehicles, average speed, and queue length against three baselines, including PPO, a DQN tailored for ATSC, and FRAP, a state-of-the-art ATSC algorithm. SDT shows improvements of 26.8%, 150%, and 21.7% over traditional ATSC algorithms, and 18%, 30%, and 15.6% over the FRAP. This research underscores the potential of integrating Large Language Models (LLMs) with DRL for traffic management, offering a promising solution to urban congestion.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords