Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret

Lai Wei; Vaibhav Srivastava

doi:10.1109/OJCSYS.2024.3372929

IEEE Open Journal of Control Systems (Jan 2024)

Nonstationary Stochastic Bandits: UCB Policies and Minimax Regret

Lai Wei,
Vaibhav Srivastava

Affiliations

Lai Wei: ORCiD; Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
Vaibhav Srivastava: ORCiD; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI, USA

DOI: https://doi.org/10.1109/OJCSYS.2024.3372929
Journal volume & issue: Vol. 3
pp. 128 – 142

Abstract

Read online

We study the nonstationary stochastic Multi-Armed Bandit (MAB) problem in which the distributions of rewards associated with arms are assumed to be time-varying and the total variation in the expected rewards is subject to a variation budget. The regret of a policy is defined by the difference in the expected cumulative reward obtained using the policy and using an oracle that selects the arm with the maximum mean reward at each time. We characterize the performance of the proposed policies in terms of the worst-case regret, which is the supremum of the regret over the set of reward distribution sequences satisfying the variation budget. We design Upper-Confidence Bound (UCB)-based policies with three different approaches, namely, periodic resetting, sliding observation window, and discount factor, and show that they are order-optimal with respect to the minimax regret, i.e., the minimum worst-case regret achieved by any policy. We also relax the sub-Gaussian assumption on reward distributions and develop robust versions of the proposed policies that can handle heavy-tailed reward distributions and maintain their performance guarantees.

Published in IEEE Open Journal of Control Systems

ISSN: 2694-085X (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Mechanical engineering and machinery: Control engineering systems. Automatic machinery (General)
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=9552933

About the journal

Abstract

Keywords