Energy Reports (Nov 2021)
Multi-objective optimal control for proton exchange membrane fuel cell via large-scale deep reinforcement learning
Abstract
To achieve a high operation efficiency and control performance of proton exchange membrane fuel cell (PEMFC), a multi-objective optimal fractional-order proportion integration differentiation (FOPID) controller is proposed to maintain the oxygen excess ratio (OER) and output voltage at the reference value simultaneously. In addition, a novel large-scale deep reinforcement learning called demonstration curriculum strategy large-scale multi-delay deep deterministic policy gradient (DCSL-MD3PG) algorithm is designed to operate as the tuner of this controller. This algorithm applies the teacher–student large-scale deep reinforcement learning strategy as well as the demonstration curriculum learning strategy and a number of techniques in the offline training to improve robustness and optimize control ability of the controller. Simulation results show that, taking the benefits of the excellent adaptability and model-free features of deep reinforcement learning, the adaptive optimal FOPID algorithm can ensures the optimal and comprehensive control performance of OER and output voltage while satisfying the security constraints of PEMFC.