International Journal of Industrial Engineering Computations (Sep 2024)

Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework

  • Farshad Seifi ,
  • Seyed Taghi Akhavan Niaki

DOI
https://doi.org/10.5267/j.ijiec.2024.6.003
Journal volume & issue
Vol. 15, no. 4
pp. 951 – 964

Abstract

Read online

The stochastic contextual bandit problem, recognized for its effectiveness in navigating the classic exploration-exploitation dilemma through ongoing player-environment interactions, has found broad applications across various industries. This utility largely stems from the algorithms’ ability to accurately forecast reward functions and maintain an optimal balance between exploration and exploitation, contingent upon the precise selection and calibration of hyperparameters. However, the inherently dynamic and real-time nature of bandit environments significantly complicates hyperparameter tuning, rendering traditional offline methods inadequate. While specialized methods have been developed to overcome these challenges, they often face three primary issues: difficulty in adaptively learning hyperparameters in ever-changing environments, inability to simultaneously optimize multiple hyperparameters for complex models, and inefficiencies in data utilization and knowledge transfer from analogous tasks. To tackle these hurdles, this paper introduces an innovative transfer learning-based approach designed to harness past task knowledge for accelerated optimization and dynamically optimize multiple hyperparameters, making it well-suited for fluctuating environments. The method employs a dual Gaussian meta-model strategy—one for transfer learning and the other for assessing hyperparameters’ performance within the current task —enabling it to leverage insights from previous tasks while quickly adapting to new environmental changes. Furthermore, the framework’s meta-model-centric architecture enables simultaneous optimization of multiple hyperparameters. Experimental evaluations demonstrate that this approach markedly outperforms competing methods in scenarios with perturbations and exhibits superior performance in 70% of stationary cases while matching performance in the remaining 30%. This superiority in performance, coupled with its computational efficiency on par with existing alternatives, positions it as a superior and practical solution for optimizing hyperparameters in contextual bandit settings.

Keywords