Radioengineering (Sep 2024)
Meta-Reinforcement Learning in Time-Varying UAV Communications: Adaptive Anti-Jamming Channel Selection
Abstract
Unmanned Aerial Vehicle (UAV) communication networks are vulnerable to malicious jamming and co-channel interference, deteriorating the performance of the networks. Therefore, the exploration of anti-jamming methods to enhance communication security becomes a significant challenge. In this paper, we propose a novel anti-jamming channel selection scheme in a multi-channel multi-UAV network. We first formulate the anti-jamming problem as a Partially Observable Stochastic Game (POSG), where the UAV pairs with partial observability compete for a limited number of communication channels against a Markov jammer. To ensure rapid adaptation to the dynamic jamming environment, we propose a Meta-Mean-Field Q-learning (MMFQ) algorithm, which provides a Nash Equilibrium (NE) solution to the POSG problem. Furthermore, we derive the expressions of the upper bound for the loss function of MMFQ and prove the convergence of the proposed algorithm. Simulation results demonstrate that the proposed algorithm can achieve a superior average reward compared to the benchmark algorithms, facilitating throughput enhancement and resource utilization increase, especially for large-scale UAV communication networks.