Deep Reinforcement Learning-Based Scheduling for Multiband Massive MIMO

Victor Hugo L. Lopes; Cleverson Veloso Nahum; Ryan M. Dreifuerst; Pedro Batista; Aldebaro Klautau; Kleber Vieira Cardoso; Robert W. Heath

doi:10.1109/ACCESS.2022.3224808

IEEE Access (Jan 2022)

Deep Reinforcement Learning-Based Scheduling for Multiband Massive MIMO

Victor Hugo L. Lopes,
Cleverson Veloso Nahum,
Ryan M. Dreifuerst,
Pedro Batista,
Aldebaro Klautau,
Kleber Vieira Cardoso,
Robert W. Heath

Affiliations

Victor Hugo L. Lopes: ORCiD; Institute of Informatics, Federal University of Goiás, Goiânia, Brazil
Cleverson Veloso Nahum: ORCiD; Department of Computer and Telecommunication Engineering, Federal University of Pará, Belém, Brazil
Ryan M. Dreifuerst: ORCiD; Wireless Networking and Communications Group, The University of Texas at Austin, Austin, TX, USA
Pedro Batista: Ericsson Research, Stockholm, Sweden
Aldebaro Klautau: ORCiD; Department of Computer and Telecommunication Engineering, Federal University of Pará, Belém, Brazil
Kleber Vieira Cardoso: ORCiD; Institute of Informatics, Federal University of Goiás, Goiânia, Brazil
Robert W. Heath: ORCiD; Department of Electronics and Computer Engineering, North Carolina State University, Raleigh, NC, USA

DOI: https://doi.org/10.1109/ACCESS.2022.3224808
Journal volume & issue: Vol. 10
pp. 125509 – 125525

Abstract

Read online

Fifth-generation (5G) cellular communication systems have embraced massive multiple-input-multiple-output (MIMO) in the low- and mid-band frequencies. In a multiband system, the base station can serve different users in each band, while the user equipment can operate only in a single band simultaneously. This paper considers a massive MIMO system where channels are dynamically allocated in different frequency bands. We treat multiband massive MIMO as a scheduling and resource allocation problem and propose deep reinforcement learning (DRL) agents to perform user scheduling. The DRL agents use buffer and channel information to compose their observation space, and the agent’s reward function maximizes the transmitted throughput and minimizes the packet loss rate. We compare the proposed DRL algorithms with traditional baselines, such as maximum throughput and proportional fairness. The results show that the DRL models outperformed baselines obtaining a 20% higher network sum rate and an 84% smaller packet loss rate. Moreover, we compare different DRL algorithms focusing on training time to assess the online implementation of the DRL agents, showing that the best agent needs about 50K training steps to converge.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords