A linear response bandit problem

Assaf Zeevi; Alexander Goldenshluger

Stochastic Systems (Jan 2013)

A linear response bandit problem

Assaf Zeevi,
Alexander Goldenshluger

Affiliations

Assaf Zeevi
Alexander Goldenshluger

Journal volume & issue: Vol. 3, no. 1
pp. 230 – 261

Abstract

Read online

We consider a two–armed bandit problem which involves sequentialsampling from two non-homogeneous populations. The responsein each is determined by a random covariate vector and a vector ofparameters whose values are not known a priori.The goal is to maximize cumulative expected reward. We study this problemin a minimax setting, and develop rate-optimal polices that combinemyopic action based on least squares estimates with a suitable "forced sampling'' strategy. It is shown that the regret growslogarithmically in the time horizon n and no policy can achievea slower growth rate over all feasible problem instances. In thissetting of linear response bandits, the identity of thesub-optimal action changes with the values of the covariatevector, and the optimal policy is subject to sampling from theinferior population at a rate that grows like $sqrt{n}$.

Published in Stochastic Systems

ISSN: 1946-5238 (Print)
Publisher: Institute for Operations Research and the Management Sciences (INFORMS), Applied Probability Society
Country of publisher: United States
LCC subjects: Science: Mathematics: Probabilities. Mathematical statistics; Technology: Technology (General): Industrial engineering. Management engineering: Applied mathematics. Quantitative methods
Website: http://www.i-journals.org/ssy/index.php

About the journal

Abstract

Keywords