Reinforcement-Learning-Based Robust Resource Management for Multi-Radio Systems

James Delaney; Steve Dowey; Chi-Tsun Cheng

doi:10.3390/s23104821

Sensors (May 2023)

Reinforcement-Learning-Based Robust Resource Management for Multi-Radio Systems

James Delaney,
Steve Dowey,
Chi-Tsun Cheng

Affiliations

James Delaney: Manufacturing, Materials and Mechatronics, School of Engineering, STEM College, RMIT University, 124 La Trobe St., Melbourne, VIC 3000, Australia
Steve Dowey: Manufacturing, Materials and Mechatronics, School of Engineering, STEM College, RMIT University, 124 La Trobe St., Melbourne, VIC 3000, Australia
Chi-Tsun Cheng: Manufacturing, Materials and Mechatronics, School of Engineering, STEM College, RMIT University, 124 La Trobe St., Melbourne, VIC 3000, Australia

DOI: https://doi.org/10.3390/s23104821
Journal volume & issue: Vol. 23, no. 10
p. 4821

Abstract

Read online

The advent of the Internet of Things (IoT) has triggered an increased demand for sensing devices with multiple integrated wireless transceivers. These platforms often support the advantageous use of multiple radio technologies to exploit their differing characteristics. Intelligent radio selection techniques allow these systems to become highly adaptive, ensuring more robust and reliable communications under dynamic channel conditions. In this paper, we focus on the wireless links between devices equipped by deployed operating personnel and intermediary access-point infrastructure. We use multi-radio platforms and wireless devices with multiple and diverse transceiver technologies to produce robust and reliable links through the adaptive control of available transceivers. In this work, the term ‘robust’ refers to communications that can be maintained despite changes in the environmental and radio conditions, i.e., during periods of interference caused by non-cooperative actors or multi-path or fading conditions in the physical environment. In this paper, a multi-objective reinforcement learning (MORL) framework is applied to address a multi-radio selection and power control problem. We propose independent reward functions to manage the trade-off between the conflicting objectives of minimised power consumption and maximised bit rate. We also adopt an adaptive exploration strategy for learning a robust behaviour policy and compare its online performance to conventional methods. An extension to the multi-objective state–action–reward–state–action (SARSA) algorithm is proposed to implement this adaptive exploration strategy. When applying adaptive exploration to the extended multi-objective SARSA algorithm, we achieve a 20% increase in the F1 score in comparison to one with decayed exploration policies.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords