Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning

Muddasar Naeem; Antonio Coronato; Zaib Ullah; Sajid Bashir; Giovanni Paragliola

doi:10.3390/s22218278

Sensors (Oct 2022)

Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning

Muddasar Naeem,
Antonio Coronato,
Zaib Ullah,
Sajid Bashir,
Giovanni Paragliola

Affiliations

Muddasar Naeem: Institute of High Performance Computing and Networking, National Research Council of Italy, 80131 Naples, Italy
Antonio Coronato: Centro di Ricerche sulle Tecnologie ICT per la Salute ed il Benessere, Università Giustino Fortunato, 82100 Benevento, Italy
Zaib Ullah: Institute of High Performance Computing and Networking, National Research Council of Italy, 80131 Naples, Italy
Sajid Bashir: Department of Electrical Engineering, National University of Sciences & Technology, Islamabad 44000, Pakistan
Giovanni Paragliola: Institute of High Performance Computing and Networking, National Research Council of Italy, 80131 Naples, Italy

DOI: https://doi.org/10.3390/s22218278
Journal volume & issue: Vol. 22, no. 21
p. 8278

Abstract

Read online

Multiple Input Multiple Output (MIMO) systems have been gaining significant attention from the research community due to their potential to improve data rates. However, a suitable scheduling mechanism is required to efficiently distribute available spectrum resources and enhance system capacity. This paper investigates the user selection problem in Multi-User MIMO (MU-MIMO) environment using the multi-agent Reinforcement learning (RL) methodology. Adopting multiple antennas’ spatial degrees of freedom, devices can serve to transmit simultaneously in every time slot. We aim to develop an optimal scheduling policy by optimally selecting a group of users to be scheduled for transmission, given the channel condition and resource blocks at the beginning of each time slot. We first formulate the MU-MIMO scheduling problem as a single-state Markov Decision Process (MDP). We achieve the optimal policy by solving the formulated MDP problem using RL. We use aggregated sum-rate of the group of users selected for transmission, and a 20% higher sum-rate performance over the conventional methods is reported.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords