Random-Delay-Corrected Deep Reinforcement Learning Framework for Real-World Online Closed-Loop Network Automation

Keliang Du; Luhan Wang; Yu Liu; Haiwen Niu; Shaoxin Huang; Xiangming Wen

doi:10.3390/app122312297

Applied Sciences (Dec 2022)

Random-Delay-Corrected Deep Reinforcement Learning Framework for Real-World Online Closed-Loop Network Automation

Keliang Du,
Luhan Wang,
Yu Liu,
Haiwen Niu,
Shaoxin Huang,
Xiangming Wen

Affiliations

Keliang Du: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
Luhan Wang: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
Yu Liu: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
Haiwen Niu: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
Shaoxin Huang: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China
Xiangming Wen: Beijing Laboratory of Advanced Information Networks, Beijing University of Posts and Telecommunications, Beijing 100876, China

DOI: https://doi.org/10.3390/app122312297
Journal volume & issue: Vol. 12, no. 23
p. 12297

Abstract

Read online

The future mobile communication networks (beyond 5th generation (5G)) are evolving toward the service-based architecture where network functions are fine-grained, thereby meeting the dynamic requirements of diverse and differentiated vertical applications. Consequently, the complexity of network management becomes higher, and artificial intelligence (AI) technologies can improve AI-native network automation with their ability to solve complex problems. Specifically, deep reinforcement learning (DRL) technologies are considered the key to intelligent network automation with a feedback mechanism similar to that of online closed-loop architecture. However, the 0-delay assumptions of the standard Markov decision process (MDP) of traditional DRL algorithms cannot directly be adopted into real-world networks because there exist random delays between the agent and the environment that will affect the performance significantly. To address this problem, this paper proposes a random-delay-corrected framework. We first abstract the scenario and model it as a partial history-dependent MDP (PH-MDP), and prove that it can be transformed to be the standard MDP solved by the traditional DRL algorithms. Then, we propose a random-delay-corrected DRL framework with a forward model and a delay-corrected trajectory sampling to obtain samples by continuous interactions to train the agent. Finally, we propose a delayed-deep-Q-network (delayed-DQN) algorithm based on the framework. For the evaluation, we develop a real-world cloud-native 5G core network prototype whose management architecture follows an online closed-loop mechanism. A use case on the top of the prototype namely delayed-DQN-enabled access and mobility management function (AMF) scaling is implemented for specific evaluations. Several experiments are designed and the results show that our proposed methodologies perform better in the random-delayed networks than other methods (e.g., the standard DQN algorithm).

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords