Multi-agent reinforcement learning based optimal energy sensing threshold control in distributed cognitive radio networks with directional antenna

Thi Thu Hien Pham; Wonjong Noh; Sungrae Cho

ICT Express (Jun 2024)

Multi-agent reinforcement learning based optimal energy sensing threshold control in distributed cognitive radio networks with directional antenna

Thi Thu Hien Pham,
Wonjong Noh,
Sungrae Cho

Affiliations

Thi Thu Hien Pham: School of Computer Science and Engineering, Chung-Ang University, Seoul 06974, South Korea
Wonjong Noh: School of Software, Hallym University, Chuncheon 24252, South Korea; Corresponding authors.
Sungrae Cho: School of Computer Science and Engineering, Chung-Ang University, Seoul 06974, South Korea; Corresponding authors.

Journal volume & issue: Vol. 10, no. 3
pp. 472 – 478

Abstract

Read online

In CRNs, it is crucial to develop an efficient and reliable spectrum detector that consistently provides accurate information about the channel state. In this work, we investigate a CSS in a fully-distributed environment where all secondary users (SUs) are equipped with directional antennas and make decisions based solely on their local knowledge without information sharing between SUs. First, we establish a stochastic sequential optimization problem, which is an NP-hard, that maximizes the SU’s detection accuracy by the dynamic and optimal control of the energy sensing/detection threshold. It can enable SUs to select an available channel and sector without causing interference to the primary network. To address it in a distributed environment, the problem is transformed into a decentralized partially observed Markov decision process (Dec-POMDP) problem. Second, in order to determine the best control for the Dec-POMDP in a practical environment without any prior knowledge of state–action transition probabilities, we develop a multi-agent deep deterministic policy gradient (MADDPG)-based algorithm, which is referred to as MA-DCSS. This algorithm adopts the centralized training and decentralized execution (CTDE) architecture. Third, we analyzed its computational complexity and showed the proposed approach’s scalability by the polynomial computational complexity, in terms of the number of channels, sectors, and SUs. Lastly, the simulation confirms that the proposed scheme provides enhanced performance in terms of convergence speed, accurate detection, and false alarm probabilities when it is compared to baseline algorithms.

Published in ICT Express

ISSN: 2405-9595 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.journals.elsevier.com/ict-express/

About the journal

Abstract

Keywords