Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks

Weiheng Jiang; Wanxin Yu; Wenbo Wang; Tiancong Huang

doi:10.3390/s22041651

Sensors (Feb 2022)

Multi-Agent Reinforcement Learning for Joint Cooperative Spectrum Sensing and Channel Access in Cognitive UAV Networks

Weiheng Jiang,
Wanxin Yu,
Wenbo Wang,
Tiancong Huang

Affiliations

Weiheng Jiang: Communication Measurement and Control Center, Chongqing University, Chongqing 400044, China
Wanxin Yu: Communication Measurement and Control Center, Chongqing University, Chongqing 400044, China
Wenbo Wang: Faculty of Engineering, Bar Ilan University, Ramat Gan 5290002, Israel
Tiancong Huang: School of Microelectronics and Communication Engineering, Chongqing University, Chongqing 400044, China

DOI: https://doi.org/10.3390/s22041651
Journal volume & issue: Vol. 22, no. 4
p. 1651

Abstract

Read online

This paper studies the problem of distributed spectrum/channel access for cognitive radio-enabled unmanned aerial vehicles (CUAVs) that overlay upon primary channels. Under the framework of cooperative spectrum sensing and opportunistic transmission, a one-shot optimization problem for channel allocation, aiming to maximize the expected cumulative weighted reward of multiple CUAVs, is formulated. To handle the uncertainty due to the lack of prior knowledge about the primary user activities as well as the lack of the channel-access coordinator, the original problem is cast into a competition and cooperation hybrid multi-agent reinforcement learning (CCH-MARL) problem in the framework of Markov game (MG). Then, a value-iteration-based RL algorithm, which features upper confidence bound-Hoeffding (UCB-H) strategy searching, is proposed by treating each CUAV as an independent learner (IL). To address the curse of dimensionality, the UCB-H strategy is further extended with a double deep Q-network (DDQN). Numerical simulations show that the proposed algorithms are able to efficiently converge to stable strategies, and significantly improve the network performance when compared with the benchmark algorithms such as the vanilla Q-learning and DDQN algorithms.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords