Markov Decision Processes with Multiple Long-run Average Objectives

Tomáš Brázdil; Václav Brožek; Krishnendu Chatterjee; Vojtěch Forejt; Antonín Kučera

doi:10.2168/LMCS-10(1:13)2014

Logical Methods in Computer Science (Feb 2014)

Markov Decision Processes with Multiple Long-run Average Objectives

Tomáš Brázdil,
Václav Brožek,
Krishnendu Chatterjee,
Vojtěch Forejt,
Antonín Kučera

Affiliations

Tomáš Brázdil
Václav Brožek
Krishnendu Chatterjee: ORCiD
Vojtěch Forejt
Antonín Kučera: ORCiD

DOI: https://doi.org/10.2168/LMCS-10(1:13)2014
Journal volume & issue: Vol. Volume 10, Issue 1

Abstract

Read online

We study Markov decision processes (MDPs) with multiple limit-average (or mean-payoff) functions. We consider two different objectives, namely, expectation and satisfaction objectives. Given an MDP with k limit-average functions, in the expectation objective the goal is to maximize the expected limit-average value, and in the satisfaction objective the goal is to maximize the probability of runs such that the limit-average value stays above a given vector. We show that under the expectation objective, in contrast to the case of one limit-average function, both randomization and memory are necessary for strategies even for epsilon-approximation, and that finite-memory randomized strategies are sufficient for achieving Pareto optimal values. Under the satisfaction objective, in contrast to the case of one limit-average function, infinite memory is necessary for strategies achieving a specific value (i.e. randomized finite-memory strategies are not sufficient), whereas memoryless randomized strategies are sufficient for epsilon-approximation, for all epsilon>0. We further prove that the decision problems for both expectation and satisfaction objectives can be solved in polynomial time and the trade-off curve (Pareto curve) can be epsilon-approximated in time polynomial in the size of the MDP and 1/epsilon, and exponential in the number of limit-average functions, for all epsilon>0. Our analysis also reveals flaws in previous work for MDPs with multiple mean-payoff functions under the expectation objective, corrects the flaws, and allows us to obtain improved results.

computer science - computer science and game theory

Published in Logical Methods in Computer Science

ISSN: 1860-5974 (Online)
Publisher: Logical Methods in Computer Science e.V.
Country of publisher: Germany
LCC subjects: Philosophy. Psychology. Religion: Logic; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://lmcs.episciences.org/

About the journal

Abstract

Keywords