IEEE Access (Jan 2021)

A Nearer Optimal and Faster Trained Value Iteration ADP for Discrete-Time Nonlinear Systems

  • Junping Hu,
  • Gen Yang,
  • Zhicheng Hou,
  • Gong Zhang,
  • Wenlin Yang,
  • Weijun Wang

DOI
https://doi.org/10.1109/ACCESS.2021.3051984
Journal volume & issue
Vol. 9
pp. 14933 – 14944

Abstract

Read online

Adaptive dynamic programming (ADP) is generally implemented using three neural networks: model network, action network, and critic network. In the conventional works of the value iteration ADP, the model network is initialized randomly and trained by the backpropagation algorithm, whose results are easy to get trapped in a local minimum; both the critic network and action network are trained in each outer-loop, which is time-consuming. To approximate the optimal control policy more accurately and decrease the value iteration ADP training time, we propose a nearer optimal and faster trained value iteration ADP for discrete-time nonlinear systems in this study. First, before training the model network with a backpropagation algorithm, we use a global searching method, i.e., genetic algorithm, to evolve the weights and biases of the neural network for a few generations. Second, in the outer-loop training process, we propose a trigger mechanism to decide whether to train the action network or not, which can save much training time. Examples of both linear and nonlinear systems are induced to verify the superiority of the proposed method compared with the conventional value iteration ADP. The simulation results show that the proposed algorithm can provide a nearer optimal control policy and save more training time than the conventional value iteration ADP.

Keywords