Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

Seongjun Jo; Wooyeol Yang; Haing Kun Choi; Eonsu Noh; Han-Shin Jo; Jaedon Park

doi:10.3390/s22041630

Sensors (Feb 2022)

Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

Seongjun Jo,
Wooyeol Yang,
Haing Kun Choi,
Eonsu Noh,
Han-Shin Jo,
Jaedon Park

Affiliations

Seongjun Jo: Department of Electronic Engineering, Hanbat National University, Daejeon 34158, Korea
Wooyeol Yang: Department of Electronic Engineering, Hanbat National University, Daejeon 34158, Korea
Haing Kun Choi: TnB Radio Tech., Seoul 08504, Korea
Eonsu Noh: Agency for Defense Development, Daejeon 34186, Korea
Han-Shin Jo: Department of Electronic Engineering, Hanbat National University, Daejeon 34158, Korea
Jaedon Park: Agency for Defense Development, Daejeon 34186, Korea

DOI: https://doi.org/10.3390/s22041630
Journal volume & issue: Vol. 22, no. 4
p. 1630

Abstract

Read online

A High Altitude Platform Station (HAPS) can facilitate high-speed data communication over wide areas using high-power line-of-sight communication; however, it can significantly interfere with existing systems. Given spectrum sharing with existing systems, the HAPS transmission power must be adjusted to satisfy the interference requirement for incumbent protection. However, excessive transmission power reduction can lead to severe degradation of the HAPS coverage. To solve this problem, we propose a multi-agent Deep Q-learning (DQL)-based transmission power control algorithm to minimize the outage probability of the HAPS downlink while satisfying the interference requirement of an interfered system. In addition, a double DQL (DDQL) is developed to prevent the potential risk of action-value overestimation from the DQL. With a proper state, reward, and training process, all agents cooperatively learn a power control policy for achieving a near-optimal solution. The proposed DQL power control algorithm performs equal or close to the optimal exhaustive search algorithm for varying positions of the interfered system. The proposed DQL and DDQL power control yields the same performance, which indicates that the actional value overestimation does not adversely affect the quality of the learned policy.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords