Jisuanji kexue yu tansuo (Nov 2022)

Stochastic Ensemble Policy Transfer

  • CHANG Tian, ZHANG Zongzhang, YU Yang

DOI
https://doi.org/10.3778/j.issn.1673-9418.2105043
Journal volume & issue
Vol. 16, no. 11
pp. 2531 – 2536

Abstract

Read online

Reinforcement learning (RL) has achieved great success on sequential decision-making problems. Along with the fast advances of RL, transfer learning (TL) arises as an important technique to accelerate the learning process of RL by leveraging and transferring external knowledge. Policy transfer is a kind of transfer learning approach, in which the external knowledge is teacher policies from source tasks. Existing policy transfer methods either transfer knowledge by measuring similarities between source and target tasks, or select best policy by estimating performance of source policies on target task. However, performance estimation can sometimes be unreliable, which can lead to negative transfer. To solve this problem, this paper develops a novel policy transfer method called stochastic ensemble policy transfer (SEPT), which generates a teacher policy to make transfer instead of choosing a policy from source policy library. SEPT changes policy transfer into the option learning problem to get termination probability. Then teacher policy is integrated from policy library and probability weights of source policies are calculated from termination probability. The knowledge of teacher policy is transferred by policy distillation. Experimental results show SEPT accelerates RL effectively and outperforms other state-of-the-art policy transfer methods in both discrete and continuous action spaces.

Keywords