Applied Sciences (Jun 2024)
Designing Reward Functions Using Active Preference Learning for Reinforcement Learning in Autonomous Driving Navigation
Abstract
This study presents a method based on active preference learning to overcome the challenges of designing reward functions for autonomous navigation. Results obtained from training with artificially designed reward functions may not accurately reflect human intentions. We focus on the limitations of traditional reward functions, which often fail to facilitate complex tasks in continuous state spaces. We propose the adoption of active preference learning to resolve these issues and to generate reward functions that align with human preferences. This approach leverages an individual’s subjective preferences to guide an agent’s learning process, enabling the creation of reward functions that reflect human desires. We utilize mutual information to generate informative queries and apply information gained to balance the agent’s uncertainty with the human’s response capacity, encouraging the agent to pose straightforward and informative questions. We further employ the No-U-Turn Sampler (NUTS) method to refine the belief model, which outperforms that constructed using the Metropolis algorithm. Subsequently, we retrain the agent using reward weights derived from active preference learning. As a result, our autonomous driving vehicle can navigate between random starting and ending points without dependence on high-precision maps or routing, relying solely on its forward vision. We validate our approach’s performance within the CARLA simulation environment. Our algorithm significantly improved the success rate of autonomous driving navigation tasks that originally failed due to artificially designed rewards, increasing it to approximately 60%. Experimental results show significant improvement over the baseline algorithm, providing a solid foundation for enhancing navigation capabilities in autonomous driving systems and advancing the field of autonomous driving intelligence.
Keywords