Frontiers in Artificial Intelligence (Dec 2024)

Causal contextual bandits with one-shot data integration

  • Chandrasekar Subramanian,
  • Chandrasekar Subramanian,
  • Balaraman Ravindran,
  • Balaraman Ravindran

DOI
https://doi.org/10.3389/frai.2024.1346700
Journal volume & issue
Vol. 7

Abstract

Read online

We study a contextual bandit setting where the agent has access to causal side information, in addition to the ability to perform multiple targeted experiments corresponding to potentially different context-action pairs—simultaneously in one-shot within a budget. This new formalism provides a natural model for several real-world scenarios where parallel targeted experiments can be conducted and where some domain knowledge of causal relationships is available. We propose a new algorithm that utilizes a novel entropy-like measure that we introduce. We perform several experiments, both using purely synthetic data and using a real-world dataset. In addition, we study sensitivity of our algorithm's performance to various aspects of the problem setting. The results show that our algorithm performs better than baselines in all of the experiments. We also show that the algorithm is sound; that is, as budget increases, the learned policy eventually converges to an optimal policy. Further, we theoretically bound our algorithm's regret under additional assumptions. Finally, we provide ways to achieve two popular notions of fairness, namely counterfactual fairness and demographic parity, with our algorithm.

Keywords