Tongxin xuebao (Jul 2024)
Fast deep reinforcement learning anti-jamming algorithm based on similar sample generation
Abstract
To improve the learning efficiency of anti-jamming algorithms based on deep reinforcement learning and enable them to adapt more quickly to unknown jamming environments, a fast deep reinforcement learning anti-jamming algorithm based on similar sample generation was proposed. By combining the similarity measurement of state-action pairs, derived from bisimulation, with an anti-jamming algorithm grounded in the deep Q-network, this algorithm was able to quickly learn effective multi-domain anti-jamming strategies in unknown, dynamic jamming environments. Specifically, once a transmission action was completed, the proposed algorithm first interacted with the environment using the deep Q-network to acquire actual state-action pairs. Then it generated a set of similar state-action pairs based on bisimulation, employing these similar state-action pairs to produce simulated training samples. Through these operations, the algorithm was able to acquire a large number of training samples at each iteration step, thereby significantly accelerating the training process and convergence speed. Simulation results show that under comb sweep jamming and intelligent blocking jamming, the proposed algorithm exhibits rapid convergence speed, and its normalized throughput after convergence significantly superior to the conventional deep Q-network algorithm, the Q-learning algorithm, and the improved Q-learning algorithm based on knowledge reuse.