Tongxin xuebao (May 2024)
Spectrum resource allocation for high-throughput satellite communications based on behavior cloning
Abstract
In high-throughput multi-beam satellite systems, the dimensionality of the spectrum resource allocation problem increased drastically with the number of satellite beams and service users, which caused an exponential rise in the complexity of the solution. To address the challenge, a two-stage algorithm that combined behavior cloning (BC) with deep reinforcement learning (DRL) was proposed. In the first stage, the strategy network was pretrained using existing decision data from satellite operation through behavior cloning, which mimicked expert behavior to reduce blind exploration and accelerate algorithm convergence. In the second stage, the strategy network was further optimized using the proximal policy optimization (PPO), and a convolutional block attention module (CBAM) was employed to better extract the user traffic features, thereby enhancing overall algorithm performance. Simulation results demonstrate that the proposed algorithm outperforms the benchmark algorithms in terms of convergence speed and algorithm stability, and also delivers superior performance in system delay, average system satisfaction, and spectrum efficiency.