A Markov chain Monte Carlo algorithm for Bayesian policy search

Vahid Tavakol Aghaei; Ahmet Onat; Sinan Yıldırım

doi:10.1080/21642583.2018.1528483

Systems Science & Control Engineering (Jan 2018)

A Markov chain Monte Carlo algorithm for Bayesian policy search

Vahid Tavakol Aghaei,
Ahmet Onat,
Sinan Yıldırım

Affiliations

Vahid Tavakol Aghaei: Sabancı University
Ahmet Onat: Sabancı University
Sinan Yıldırım: Sabancı University

DOI: https://doi.org/10.1080/21642583.2018.1528483
Journal volume & issue: Vol. 6, no. 1
pp. 438 – 455

Abstract

Read online

Policy search algorithms have facilitated application of Reinforcement Learning (RL) to dynamic systems, such as control of robots. Many policy search algorithms are based on the policy gradient, and thus may suffer from slow convergence or local optima complications. In this paper, we take a Bayesian approach to policy search under RL paradigm, for the problem of controlling a discrete time Markov decision process with continuous state and action spaces and with a multiplicative reward structure. For this purpose, we assume a prior over policy parameters and aim for the ‘posterior’ distribution where the ‘likelihood’ is the expected reward. We propound a Markov chain Monte Carlo algorithm as a method of generating samples for policy parameters from this posterior. The proposed algorithm is compared with certain well-known policy gradient-based RL methods and exhibits more appropriate performance in terms of time response and convergence rate, when applied to a nonlinear model of a Cart-Pole benchmark.

Published in Systems Science & Control Engineering

ISSN: 2164-2583 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General); Technology: Engineering (General). Civil engineering (General): Systems engineering
Website: https://www.tandfonline.com/journals/tssc

About the journal

Abstract

Keywords