Aerospace (Oct 2022)
A Policy-Reuse Algorithm Based on Destination Position Prediction for Aircraft Guidance Using Deep Reinforcement Learning
Abstract
Artificial intelligence for aircraft guidance is a hot research topic, and deep reinforcement learning is one of the promising methods. However, due to the different movement patterns of destinations in different guidance tasks, it is inefficient to train agents from scratch. In this article, a policy-reuse algorithm based on destination position prediction is proposed to solve this problem. First, the reward function is optimized to improve flight trajectory quality and training efficiency. Then, by predicting the possible termination position of the destinations in different moving patterns, the problem is transformed into a fixed-position destination aircraft guidance problem. Last, taking the agent in the fixed-position destination scenario as the baseline agent, a new guidance agent can be trained efficiently. Simulation results show that this method can significantly improve the training efficiency of agents in new tasks, and its performance is stable in tasks with different similarities. This research broadens the application scope of the policy-reuse approach and also enlightens the research in other fields.
Keywords