Global Energy Interconnection (Feb 2023)

Deep reinforcement learning based multi-level dynamic reconfiguration for urban distribution network: A cloud-edge collaboration architecture

  • Siyuan Jiang,
  • Hongjun Gao,
  • Xiaohui Wang,
  • Junyong Liu,
  • Kunyu Zuo

Journal volume & issue
Vol. 6, no. 1
pp. 1 – 14

Abstract

Read online

With the construction of the power Internet of Things (IoT), communication between smart devices in urban distribution networks has been gradually moving towards high speed, high compatibility, and low latency, which provides reliable support for reconfiguration optimization in urban distribution networks. Thus, this study proposed a deep reinforcement learning based multi-level dynamic reconfiguration method for urban distribution networks in a cloud-edge collaboration architecture to obtain a real-time optimal multi-level dynamic reconfiguration solution. First, the multi-level dynamic reconfiguration method was discussed, which included feeder-, transformer-, and substation-levels. Subsequently, the multi-agent system was combined with the cloud-edge collaboration architecture to build a deep reinforcement learning model for multi-level dynamic reconfiguration in an urban distribution network. The cloud-edge collaboration architecture can effectively support the multi-agent system to conduct “centralized training and decentralized execution” operation modes and improve the learning efficiency of the model. Thereafter, for a multi-agent system, this study adopted a combination of offline and online learning to endow the model with the ability to realize automatic optimization and updation of the strategy. In the offline learning phase, a Q-learning-based multi-agent conservative Q-learning (MACQL) algorithm was proposed to stabilize the learning results and reduce the risk of the next online learning phase. In the online learning phase, a multi- agent deep deterministic policy gradient (MADDPG) algorithm based on policy gradients was proposed to explore the action space and update the experience pool. Finally, the effectiveness of the proposed method was verified through a simulation analysis of a real-world 445-node system.

Keywords