Transmission Control in NB-IoT With Model-Based Reinforcement Learning

Juan J. Alcaraz; Fernando Losilla; Francisco-Javier Gonzalez-Castano

doi:10.1109/ACCESS.2023.3284990

IEEE Access (Jan 2023)

Transmission Control in NB-IoT With Model-Based Reinforcement Learning

Juan J. Alcaraz,
Fernando Losilla,
Francisco-Javier Gonzalez-Castano

Affiliations

Juan J. Alcaraz: ORCiD; Department of Information and Communication Technologies, Technical University of Cartagena, Cartagena, Spain
Fernando Losilla: ORCiD; Department of Information and Communication Technologies, Technical University of Cartagena, Cartagena, Spain
Francisco-Javier Gonzalez-Castano: ORCiD; Telematics Engineering Department, Universidad de Vigo, Vigo, Spain

DOI: https://doi.org/10.1109/ACCESS.2023.3284990
Journal volume & issue: Vol. 11
pp. 57991 – 58005

Abstract

Read online

In Narrowband Internet of Things (NB-IoT), the control of uplink transmissions is a complex task involving device scheduling, resource allocation in the carrier, and the configuration of link-adaptation parameters. Existing heuristic proposals partially address the problem, but reinforcement learning (RL) seems to be the most effective approach a priori, given its success in similar control problems. However, the low sample efficiency of conventional (model-free) RL algorithms is an important limitation for their deployment in real systems. During their initial learning stages, RL agents need to explore the policy space selecting actions that are, in general, highly ineffective. In an NB-IoT access network this implies a disproportionate increase in transmission delays. In this paper, we make two contributions to enable the adoption of RL in NB-IoT: first, we present a multi-agent architecture based on the principle of task division. Second, we propose a new model-based RL algorithm for link adaptation characterized by its high sample efficiency. The combination of these two strategies results in an algorithm that, during the learning phase, is able to maintain the transmission delay in the order of hundreds of milliseconds, whereas model-free RL algorithms cause delays of up to several seconds. This allows our approach to be deployed, without prior training, in an operating NB-IoT network and learn to control it efficiently without degrading its performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords