A tutorial on the art of dynamic programming for some issues concerning Bellman’s principle of optimality

Eiji Mizutani; Stuart Dreyfus

ICT Express (Dec 2023)

A tutorial on the art of dynamic programming for some issues concerning Bellman’s principle of optimality

Eiji Mizutani,
Stuart Dreyfus

Affiliations

Eiji Mizutani: Industrial Management, National Taiwan University of Science and Technology, Taiwan; Corresponding author.
Stuart Dreyfus: Industrial Engineering and Operations Research, University of California, Berkeley, United States of America

Journal volume & issue: Vol. 9, no. 6
pp. 1144 – 1161

Abstract

Read online

Reinforcement learning (RL) is fundamental to current artificial intelligence (AI). Since dynamic programming, which is based on Richard Bellman’s principle of optimality, is the basis of RL (and other AI disciplines such as A* search), it is important to apply that principle correctly and artfully. This tutorial uses examples, many from published articles, to examine both the artful and occasionally erroneous application of Bellman’s principle. Our contribution is twofold. First, we emphasize the necessity of a general thought experiment by asking what we call “consultant questions” to determine the proper state space for DP formulation to be established by Bellman’s optimality principle. Second, we convey the role of art based on common sense and effortful cleverness in designing efficient DP. Since Bellman’s principle is intuitive based on common sense, it is sometimes misunderstood. For illustrative purposes, given a directed acyclic graph (DAG), we consider four deterministic minimum-cost path network problems (found in the literature), each bringing up some validity issue concerning Bellman’s principle on a different criterion. We approach them consistently by the conventional wisdom of state augmentation (on top of the physical system state space). As a result, we show that our artful choice of state description not only renders the principle valid in each problem, but also makes each DP as efficient as the standard DP that solves a shortest-path problem in the same DAG, circumventing successfully the so-called curse of dimensionality, a price to be paid frequently by state enlargement. After that, we discuss the potential importance of the treated type of path network problems for practical applications (e.g., communications network).

Published in ICT Express

ISSN: 2405-9595 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.journals.elsevier.com/ict-express/

About the journal

Abstract

Keywords