Designing Reward Functions Using Active Preference Learning for Reinforcement Learning in Autonomous Driving Navigation

Lun Ge; Xiaoguang Zhou; Yongqiang Li

doi:10.3390/app14114845

Applied Sciences (Jun 2024)

Designing Reward Functions Using Active Preference Learning for Reinforcement Learning in Autonomous Driving Navigation

Lun Ge,
Xiaoguang Zhou,
Yongqiang Li

Affiliations

Lun Ge: School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China
Xiaoguang Zhou: School of Modern Post (School of Automation), Beijing University of Posts and Telecommunications, Beijing 100876, China
Yongqiang Li: Mogo Auto Intelligence and Telematics Information Technology Co., Ltd., Beijing 100010, China

DOI: https://doi.org/10.3390/app14114845
Journal volume & issue: Vol. 14, no. 11
p. 4845

Abstract

Read online

This study presents a method based on active preference learning to overcome the challenges of designing reward functions for autonomous navigation. Results obtained from training with artificially designed reward functions may not accurately reflect human intentions. We focus on the limitations of traditional reward functions, which often fail to facilitate complex tasks in continuous state spaces. We propose the adoption of active preference learning to resolve these issues and to generate reward functions that align with human preferences. This approach leverages an individual’s subjective preferences to guide an agent’s learning process, enabling the creation of reward functions that reflect human desires. We utilize mutual information to generate informative queries and apply information gained to balance the agent’s uncertainty with the human’s response capacity, encouraging the agent to pose straightforward and informative questions. We further employ the No-U-Turn Sampler (NUTS) method to refine the belief model, which outperforms that constructed using the Metropolis algorithm. Subsequently, we retrain the agent using reward weights derived from active preference learning. As a result, our autonomous driving vehicle can navigate between random starting and ending points without dependence on high-precision maps or routing, relying solely on its forward vision. We validate our approach’s performance within the CARLA simulation environment. Our algorithm significantly improved the success rate of autonomous driving navigation tasks that originally failed due to artificially designed rewards, increasing it to approximately 60%. Experimental results show significant improvement over the baseline algorithm, providing a solid foundation for enhancing navigation capabilities in autonomous driving systems and advancing the field of autonomous driving intelligence.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords