IEEE Access (Jan 2020)

A Hybrid Learning Framework for Service Function Chaining Across Geo-Distributed Data Centers

  • Tao Tang,
  • Binwei Wu,
  • Guangmin Hu

DOI
https://doi.org/10.1109/ACCESS.2020.3024135
Journal volume & issue
Vol. 8
pp. 170225 – 170236

Abstract

Read online

Service function chaining (SFC) focuses mainly on deploying various network functions in geographically distributed data centers and providing interconnect routing among them. Traditional (convex optimization-based) SFC algorithms exhibit some disadvantages on the scalability and accuracy. Recently, researches have shown the effectiveness of deep reinforcement learning (DRL) in the field of SFC. However, current DRL-based algorithms possess an extremely large action space, which leads to slow convergence and poor scalability. Some researchers relieve this issue by reformulating the SFC problem, which usually results in low utilization and high cost. To address this issue, we develop a hybrid DRL-based framework which decouples the VNF deployment and flow routing into different modules. In the proposed framework, a DRL agent is only responsible for learning the policy of VNF deployment. We customize the structure of the agent base on deep deterministic policy gradient (DDPG) and adopt several techniques to improve the learning efficiency, such as adaptive parameter noise, wolpertinger policy, and prioritized experience replay. The flow routing is conducted in a game-based module (GBM). We design a decentralized routing algorithm for the GBM to address the scalability. The end-to-end latency of flows is minimized while the resource capacity and location constraints are satisfied. During the learning process of the proposed framework, the DRL agent improves its deployment policy with the reward from the GBM (the value of reward depends on flow routing). Thus, the VNF deployment and flow routing are still jointly optimized. Compared to existing DRL-learning algorithms, the proposed hybrid DRL framework can achieve a lower cost since 1) the action space is significantly reduced due to flow routing decoupling; 2) the flow routing procedure is more efficient (the GBM adopts model-based information, e.g., the gradient). Through trace-driven simulations, we show the efficiency of our algorithm compared to existing DRL-based algorithms.

Keywords