IEEE Access (Jan 2020)

Virtual Relay Selection in LTE-V: A Deep Reinforcement Learning Approach to Heterogeneous Data

  • Xunsheng Du,
  • Hien Van Nguyen,
  • Chunxiao Jiang,
  • Yong Li,
  • F. Richard Yu,
  • Zhu Han

DOI
https://doi.org/10.1109/ACCESS.2020.2997729
Journal volume & issue
Vol. 8
pp. 102477 – 102492

Abstract

Read online

The development of Long Term Evolution (LTE) enables wireless communication with high transmission rate, low latency, and wide coverage area. These outstanding features of LTE support the next generation of vehicle-to-everything (V2X) communication, which is named LTE-V. Among the various technologies in LTE-V, placing relay nodes on vehicles is a promising approach to save power and energy, and extend the transmission range. In this paper, we consider the virtual relay node selection problem. In the problem, a base station transmits data to a vehicle relay (also known as helper) who will further disseminate the received data to the vehicular subscribers nearby. The selection of the vehicle relay node is a challenging issue since the utility of the selection can only be known after this action has been made. Another challenge of this problem is that the traditional pure optimization is inapplicable due to the imperfect information available. Motivated by the recent success of Alpha Go Zero, we employ deep reinforcement learning (DRL) as a powerful tool for facing the above challenges and solving the problem without global information. We build a bridge between the traffic information and decision of relay node selection based on the reality that the utility of vehicle relay is highly correlated with the traffic density. In our work, deep convolutional neural networks are first applied to extract traffic patterns and then learn the traffic and network topology. Deep learning (DL) acts as a role to map features inside traffic and communication topology to the decisions. Then Q-Learning dynamically updates the utilities of decisions by trials in the environment. Finally, the overall rewards can be calculated to measure these decisions, and the Q function is also updated. Simulation results based on real traffic data validate that our proposed approach can achieve high utility performance.

Keywords