End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios

Tao Xu; Zhiwei Meng; Weike Lu; Zhongwen Tong

doi:10.3390/s24154962

Sensors (Jul 2024)

End-to-End Autonomous Driving Decision Method Based on Improved TD3 Algorithm in Complex Scenarios

Tao Xu,
Zhiwei Meng,
Weike Lu,
Zhongwen Tong

Affiliations

Tao Xu: National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China
Zhiwei Meng: National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China
Weike Lu: School of Rail Transportation, Soochow University, Suzhou 215031, China
Zhongwen Tong: National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun 130015, China

DOI: https://doi.org/10.3390/s24154962
Journal volume & issue: Vol. 24, no. 15
p. 4962

Abstract

Read online

The ability to make informed decisions in complex scenarios is crucial for intelligent automotive systems. Traditional expert rules and other methods often fall short in complex contexts. Recently, reinforcement learning has garnered significant attention due to its superior decision-making capabilities. However, there exists the phenomenon of inaccurate target network estimation, which limits its decision-making ability in complex scenarios. This paper mainly focuses on the study of the underestimation phenomenon, and proposes an end-to-end autonomous driving decision-making method based on an improved TD3 algorithm. This method employs a forward camera to capture data. By introducing a new critic network to form a triple-critic structure and combining it with the target maximization operation, the underestimation problem in the TD3 algorithm is solved. Subsequently, the multi-timestep averaging method is used to address the policy instability caused by the new single critic. In addition, this paper uses Carla platform to construct multi-vehicle unprotected left turn and congested lane-center driving scenarios and verifies the algorithm. The results demonstrate that our method surpasses baseline DDPG and TD3 algorithms in aspects such as convergence speed, estimation accuracy, and policy stability.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords