Hybrid NOMA/OMA-Based Dynamic Power Allocation Scheme Using Deep Reinforcement Learning in 5G Networks

Hoang Thi Huong Giang; Tran Nhut Khai Hoan; Pham Duy Thanh; Insoo Koo

doi:10.3390/app10124236

Applied Sciences (Jun 2020)

Hybrid NOMA/OMA-Based Dynamic Power Allocation Scheme Using Deep Reinforcement Learning in 5G Networks

Hoang Thi Huong Giang,
Tran Nhut Khai Hoan,
Pham Duy Thanh,
Insoo Koo

Affiliations

Hoang Thi Huong Giang: School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea
Tran Nhut Khai Hoan: College of Engineering Technology, Can Tho University, Can Tho 94000, Vietnam
Pham Duy Thanh: School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea
Insoo Koo: School of Electrical Engineering, University of Ulsan, Ulsan 44610, Korea

DOI: https://doi.org/10.3390/app10124236
Journal volume & issue: Vol. 10, no. 12
p. 4236

Abstract

Read online

Non-orthogonal multiple access (NOMA) is considered a potential technique in fifth-generation (5G). Nevertheless, it is relatively complex when applying NOMA to a massive access scenario. Thus, in this paper, a hybrid NOMA/OMA scheme is considered for uplink wireless transmission systems where multiple cognitive users (CUs) can simultaneously transmit their data to a cognitive base station (CBS). We adopt a user-pairing algorithm in which the CUs are grouped into multiple pairs, and each group is assigned to an orthogonal sub-channel such that each user in a pair applies NOMA to transmit data to the CBS without causing interference with other groups. Subsequently, the signal transmitted by the CUs of each NOMA group can be independently retrieved by using successive interference cancellation (SIC). The CUs are assumed to harvest solar energy to maintain operations. Moreover, joint power and bandwidth allocation is taken into account at the CBS to optimize energy and spectrum efficiency in order to obtain the maximum long-term data rate for the system. To this end, we propose a deep actor-critic reinforcement learning (DACRL) algorithm to respectively model the policy function and value function for the actor and critic of the agent (i.e., the CBS), in which the actor can learn about system dynamics by interacting with the environment. Meanwhile, the critic can evaluate the action taken such that the CBS can optimally assign power and bandwidth to the CUs when the training phase finishes. Numerical results validate the superior performance of the proposed scheme, compared with other conventional schemes.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords