ICT Express (Dec 2023)

A tutorial on the art of dynamic programming for some issues concerning Bellman’s principle of optimality

  • Eiji Mizutani,
  • Stuart Dreyfus

Journal volume & issue
Vol. 9, no. 6
pp. 1144 – 1161

Abstract

Read online

Reinforcement learning (RL) is fundamental to current artificial intelligence (AI). Since dynamic programming, which is based on Richard Bellman’s principle of optimality, is the basis of RL (and other AI disciplines such as A* search), it is important to apply that principle correctly and artfully. This tutorial uses examples, many from published articles, to examine both the artful and occasionally erroneous application of Bellman’s principle. Our contribution is twofold. First, we emphasize the necessity of a general thought experiment by asking what we call “consultant questions” to determine the proper state space for DP formulation to be established by Bellman’s optimality principle. Second, we convey the role of art based on common sense and effortful cleverness in designing efficient DP. Since Bellman’s principle is intuitive based on common sense, it is sometimes misunderstood. For illustrative purposes, given a directed acyclic graph (DAG), we consider four deterministic minimum-cost path network problems (found in the literature), each bringing up some validity issue concerning Bellman’s principle on a different criterion. We approach them consistently by the conventional wisdom of state augmentation (on top of the physical system state space). As a result, we show that our artful choice of state description not only renders the principle valid in each problem, but also makes each DP as efficient as the standard DP that solves a shortest-path problem in the same DAG, circumventing successfully the so-called curse of dimensionality, a price to be paid frequently by state enlargement. After that, we discuss the potential importance of the treated type of path network problems for practical applications (e.g., communications network).

Keywords