Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning

Hengjie Li; Jianghao Zhu; Yun Zhou; Qi Feng; Donghan Feng

doi:10.1155/2022/6854620

International Transactions on Electrical Energy Systems (Jan 2022)

Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning

Hengjie Li,
Jianghao Zhu,
Yun Zhou,
Qi Feng,
Donghan Feng

Affiliations

Hengjie Li: School of Electrical Engineering and Information Engineering
Jianghao Zhu: School of Electrical Engineering and Information Engineering
Yun Zhou: School of Electrical Engineering and Information Engineering
Qi Feng: School of Electrical Engineering and Information Engineering
Donghan Feng: School of Electrical Engineering and Information Engineering

DOI: https://doi.org/10.1155/2022/6854620
Journal volume & issue: Vol. 2022

Abstract

Read online

Maximizing the return on electric vehicle charging station (EVCS) operation helps to expand the EVCS, thus expanding the EV (electric vehicle) stock and better addressing climate change. However, in the face of dynamic regulation scenarios with large data, multiple variables, and low time scales, the existing regulation strategies aiming at maximizing EVCS returns many times fail to meet the demand. To handle increasingly complex regulation scenarios, a deep reinforcement learning algorithm (DRL) based on the improved twin delayed deep deterministic policy gradient (TD3) is used to construct basic energy management strategies in this paper. To enable the strategy to be more suitable for the goal of real-time energy regulation strategy, we used Thompson sampling strategy to improve TD3’s exploration noise sampling strategy, which greatly accelerated the initial convergence of TD3 during training. Also, we use marginalised importance sampling to calculate the Q-return function for TD3, which ensures that the constructed strategies are more likely to learn high-value experiences while having higher robustness. It is shown in numerical experiments that the charging station management strategy (CSMS) based on the modified TD3 obtains the fastest convergence speed and the highest robustness and achieves the largest operational returns compared to the CSMS constructed using deep deterministic policy gradient (DDPG), actor-critic using Kronecker-factored trust region (ACKTR), trust region policy optimization (TRPO), proximal policy optimization (PPO), soft actor-critic (SAC), and the original TD3.

Published in International Transactions on Electrical Energy Systems

ISSN: 2050-7038 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://onlinelibrary.wiley.com/journal/itees

About the journal