Solving multi-armed bandit problems using a chaotic microresonator comb

Jonathan Cuevas; Ryugo Iwami; Atsushi Uchida; Kaoru Minoshima; Naoya Kuse

doi:10.1063/5.0173287

APL Photonics (Mar 2024)

Solving multi-armed bandit problems using a chaotic microresonator comb

Jonathan Cuevas,
Ryugo Iwami,
Atsushi Uchida,
Kaoru Minoshima,
Naoya Kuse

Affiliations

Jonathan Cuevas: Graduate School of Sciences and Technology for Innovation, Tokushima University, 2-1, Minami-Josanjima, Tokushima 770-8506, Japan
Ryugo Iwami: Department of Information and Computer Sciences, Saitama University, 255 Shimo-okubo, Sakura-ku, Saitama 338-8570, Japan
Atsushi Uchida: Department of Information and Computer Sciences, Saitama University, 255 Shimo-okubo, Sakura-ku, Saitama 338-8570, Japan
Kaoru Minoshima: Graduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, Japan
Naoya Kuse: Institute of Post-LED Photonics, Tokushima University, 2-1, Minami-Josanjima, Tokushima 770-8506, Japan

DOI: https://doi.org/10.1063/5.0173287
Journal volume & issue: Vol. 9, no. 3
pp. 036112 – 036112-10

Abstract

Read online

The Multi-Armed Bandit (MAB) problem, foundational to reinforcement learning-based decision-making, addresses the challenge of maximizing rewards amid multiple uncertain choices. While algorithmic solutions are effective, their computational efficiency diminishes with increasing problem complexity. Photonic accelerators, leveraging temporal and spatial-temporal chaos, have emerged as promising alternatives. However, despite these advancements, current approaches either compromise computation speed or amplify system complexity. In this paper, we introduce a chaotic microresonator frequency comb (chaotic comb) to tackle the MAB problem, where each comb mode is assigned to a slot machine. Through a proof-of-concept experiment, we employ 44 comb modes to address an MAB with 44 slot machines, demonstrating performance competitive with both conventional software algorithms and other photonic methods. Furthermore, the scalability of decision making is explored with up to 512 slot machines using experimentally obtained temporal chaos in different time slots. Power-law scalability is achieved with an exponent of 0.96, outperforming conventional software-based algorithms. Moreover, we find that a numerically calculated chaotic comb accurately reproduces experimental results, paving the way for discussions on strategies to increase the number of slot machines.

Published in APL Photonics

ISSN: 2378-0967 (Online)
Publisher: AIP Publishing LLC
Country of publisher: United States
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: https://aplphotonics.aip.org

About the journal