工程科学学报 (Mar 2024)

Regenerative braking strategy based on deep reinforcement learning for an electric mining truck

  • Weiwei YANG,
  • Denghao LUO,
  • Wenming ZHANG

DOI
https://doi.org/10.13374/j.issn2095-9389.2023.06.01.003
Journal volume & issue
Vol. 46, no. 3
pp. 503 – 513

Abstract

Read online

With the promotion of national “carbon neutral” and “green mine” strategies, pure electric mining vehicles are crucial in promoting energy conservation and emission reduction in the mining industry. However, “mileage anxiety” is the primary problem limiting their promotion and application. Regenerative braking is an essential technology for improving energy efficiency and reducing the life-cycle costs of pure electric vehicles. However, because of harsh driving conditions and substantial changes in load capacity and road slope, the scale and fluctuation characteristics of energy demand vary sharply during operation, affecting the feedback efficiency and battery life of an electric mining dump truck. Therefore, designing reasonable regenerative braking strategies for pure electric mining dump trucks is crucial. This paper uses a 50-ton pure electric mining truck as the research object and proposes a regenerative braking feedback strategy based on the deep reinforcement learning optimization algorithm. First, a mathematical model of a pure electric mining dump truck was established, which included a permanent magnet synchronous motor, power battery, four-speed automated mechanical transmission, and vehicle longitudinal dynamic model. Furthermore, power performance verification based on the Matlab/Simulink simulation platform was performed. Subsequently, an energy management strategy was proposed based on the soft actor–critic (SAC) algorithm and the deep deterministic strategy gradient (DDPG) deep reinforcement learning algorithm considering load and slope changes. In particular, the state variables include vehicle speed, acceleration, vehicle mass, road slope, battery state of charge (SOC), and battery charge–discharge rate. The transmission gear is selected as the action variable of the proposed strategy. Battery SOC and battery lifetime are used as reward functions. Furthermore, an automatic entropy adjustment mechanism is introduced to improve the adaptability of the proposed control strategy to different operating conditions. Simulation results show that compared to the rule-based control strategy, the energy efficiency of the control strategy based on dynamic programming and the proposed optimization control strategy based on the SAC and DDPG algorithms are improved by 18.15%, 17.18%, and 16.63%, respectively, and the battery lifetime is improved by 57.31%, 56.87%, and 57.38%, respect ively. Finally, the proposed energy management strategy is compared with the control strategy based on DDPG to further verify its superiority by comparing the reward curves. The results demonstrate the feasibility of the proposed control strategy based on the SAC algorithm, which has improved convergence speed by 166.7%.

Keywords