Jisuanji kexue yu tansuo (Apr 2024)
Policy Search Reinforcement Learning Method in Latent Space
Abstract
Policy search is an efficient learning method in the field of deep reinforcement learning (DRL), which is capable of solving large-scale problems with continuous state and action spaces and widely used in real-world problems. However, such method usually requires a large number of trajectory samples and extensive training time, and may suffer from poor generalization ability, making it difficult to generalize the learned policy model to seemingly small changes in the environment. In order to solve the above problems, this paper proposes a policy search DRL method based on latent space. Specifically, this paper extends the idea of state representation learning to action representation learning, i.e. learning a policy in the latent space of action representations, and then mapping the action representations to the real action space. With the introduction of representation learning models, this paper abandons the traditional end-to-end training manner in DRL and divides the whole task into two stages: large-scale representation model learning and the small-scale policy model learning, where unsupervised learning methods are employed to learn the representation models and policy search methods are used to learn the small-scale policy model. Large-scale representation models can ensure the capacity for generalization and expressiveness, while small-scale policy model can reduce the burden of policy learning, thus alleviating the issues of low sample utilization, low learning efficiency and weak generalization of action selection in DRL to some extent. Finally, the effectiveness of introducing the latent state and action representations is demonstrated by the intelligent control task CarRacing and Cheetah.
Keywords