Safe Adaptive Deep Reinforcement Learning for Autonomous Driving in Urban Environments. Additional Filter? How and Where?

Sina Alighanbari; Nasser L. Azad

doi:10.1109/ACCESS.2021.3119915

IEEE Access (Jan 2021)

Safe Adaptive Deep Reinforcement Learning for Autonomous Driving in Urban Environments. Additional Filter? How and Where?

Sina Alighanbari,
Nasser L. Azad

Affiliations

Sina Alighanbari: ORCiD; Department of Systems Design Engineering, SHEVS Laboratory, University of Waterloo, Waterloo, ON, Canada
Nasser L. Azad: ORCiD; Department of Systems Design Engineering, Faculty of Engineering, University of Waterloo, Waterloo, ON, Canada

DOI: https://doi.org/10.1109/ACCESS.2021.3119915
Journal volume & issue: Vol. 9
pp. 141347 – 141359

Abstract

Read online

Autonomous driving (AD) provides a reliable solution for safe driving by replacing human drivers responsible for the majority of accidents. The emergence of Machine Learning, specifically Deep Reinforcement Learning (DRL), and its ability to solve complex games proved its potential to address AD challenges. However, model-free methods still suffer from safety-related issues that can be resolved using safe-DRL approaches. The addition of model-based safety filters to the learning-based algorithms provides safety bounds on their performance and constraint satisfaction. In this paper, we investigate the addition of a safety filter based on Model Predictive Control and show an increase in mean testing episode reward by 110% from −75 mean episode reward during testing for 50 episodes for Deep Deterministic Policy Gradient ( $DDPG$ ) to 7.758. We study the impacts of safety filters (7.758 mean reward), heuristic rules, bounded additive noises (0.49% performance increase comparing to noise-free case), and exploration (3.425 mean reward) on the learning algorithm. We compare the effects of filters in the context of simulated exploration and bounded exploration and prove that bounded exploration results in 9.86% increase in mean reward and 12.95% decrease in std comparing to the other method. Additionally, inspired by Deep Internal Learning and biological mechanisms like brain plasticity, we investigate the idea of using each sample for training only once instead of utilizing stochastic batches which increases the mean testing accumulated reward by 1.87% and leads to the best performance (7.942 mean reward and 0.048 std). Finally, the results demonstrate better automotive results for our proposed method than DDPG. Our proposed method, DDPG with safety filter in bounded exploration and adaptive learning under noisy input conditions, has a success rate of 100% under different traffic densities for the simulation environment used in this paper and our assumptions. The proposed method’s automotive results are shown for a braking scenario to avoid collision with other road users.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords