Applied Sciences (Apr 2024)
Learning Ad Hoc Cooperation Policies from Limited Priors via Meta-Reinforcement Learning
Abstract
When agents need to collaborate without previous coordination, the multi-agent cooperation problem transforms into an ad hoc teamwork (AHT) problem. Mainstream research on AHT is divided into type-based and type-free methods. The former depends on known teammate types to infer the current teammate type, while the latter does not require them at all. However, in many real-world applications, the complete absence and sufficient knowledge of known types are both impractical. Thus, this research focuses on the challenge of AHT with limited known types. To this end, this paper proposes a method called a Few typE-based Ad hoc Teamwork via meta-reinforcement learning (FEAT), which effectively adapts to teammates using a small set of known types within a single episode. FEAT enables agents to develop a highly adaptive policy through meta-reinforcement learning by employing limited priors about known types. It also utilizes this policy to generate a diverse type repository automatically. During the ad hoc cooperation, the agent can autonomously identify known teammate types followed by directly utilizing the pre-trained optimal cooperative policy or swiftly updating the meta policy to respond to teammates of unknown types. Comprehensive experiments in the pursuit domain validate the effectiveness of the algorithm and its components.
Keywords