Deep Q-Network Learning Based Downlink Resource Allocation for Hybrid RF/VLC Systems

Shivanshu Shrivastava; Bin Chen; Chen Chen; Hui Wang; Mingjun Dai

doi:10.1109/ACCESS.2020.3014427

IEEE Access (Jan 2020)

Deep Q-Network Learning Based Downlink Resource Allocation for Hybrid RF/VLC Systems

Shivanshu Shrivastava,
Bin Chen,
Chen Chen,
Hui Wang,
Mingjun Dai

Affiliations

Shivanshu Shrivastava: ORCiD; College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Bin Chen: ORCiD; College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Chen Chen: ORCiD; School of Microelectronics and Communication Engineering, Chongqing University, Chongqing, China
Hui Wang: College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China
Mingjun Dai: ORCiD; College of Electronics and Information Engineering, Shenzhen University, Shenzhen, China

DOI: https://doi.org/10.1109/ACCESS.2020.3014427
Journal volume & issue: Vol. 8
pp. 149412 – 149434

Abstract

Read online

Developing high data rate systems to meet the requirements of fifth generation mobile systems has become crucial. Hybrid radio frequency/visible light communication (RF/VLC) has appeared as a promising mechanism for achieving this objective. In hybrid RF/VLC, data rate maximization is subject to constraints on bandwidth, power and the user association. The joint optimization problem of bandwidth, power and user association to maximize the data rate is non-concave and obtaining an optimal solution is difficult with conventional optimization algorithms. The existing solutions are based on a presumption of at least one optimization variable. In this article, this issue has been overcome by solving the joint optimization problem in hybrid RF/VLC with a deep Q-network (DQN) learning based algorithm, which has been recognized as an efficient learning based mechanism for optimization. Our system model considers one RF and multiple VLC access points (APs). The idle APs are also incorporated in the system model. The application of DQN learning based algorithm is carried out by finding an optimal policy with the help of an action-value function. As the data sets for the considered system are large, a multi-layered network is used for approximating the action-value function estimator. Finally, a transfer learning based algorithm has been proposed for maximizing the total data rate of the system for the case of a newly entering user equipment (UE) that uses the information of the environment before the arrival of the new UE. Through simulations, it is found that our proposed algorithms can lead to an improvement of more than 10% and 54% in the achievable sum-rate and number of iterations for convergence respectively as compared to that obtained with existing conventional optimization algorithms.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords