Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers

Kensei Morijiri; Takatomo Mihana; Kazutaka Kanno; Makoto Naruse; Atsushi Uchida

doi:10.1038/s41598-022-12155-y

Scientific Reports (May 2022)

Decision making for large-scale multi-armed bandit problems using bias control of chaotic temporal waveforms in semiconductor lasers

Kensei Morijiri,
Takatomo Mihana,
Kazutaka Kanno,
Makoto Naruse,
Atsushi Uchida

Affiliations

Kensei Morijiri: Department of Information and Computer Sciences, Saitama University
Takatomo Mihana: Department of Information and Computer Sciences, Saitama University
Kazutaka Kanno: Department of Information and Computer Sciences, Saitama University
Makoto Naruse: Department of Information Physics and Computing, Graduate School of Information Science and Technology, The University of Tokyo
Atsushi Uchida: Department of Information and Computer Sciences, Saitama University

DOI: https://doi.org/10.1038/s41598-022-12155-y
Journal volume & issue: Vol. 12, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Decision making using photonic technologies has been intensively researched for solving the multi-armed bandit problem, which is fundamental to reinforcement learning. However, these technologies are yet to be extended to large-scale multi-armed bandit problems. In this study, we conduct a numerical investigation of decision making to solve large-scale multi-armed bandit problems by controlling the biases of chaotic temporal waveforms generated in semiconductor lasers with optical feedback. We generate chaotic temporal waveforms using the semiconductor lasers, and each waveform is assigned to a slot machine (or choice) in the multi-armed bandit problem. The biases in the amplitudes of the chaotic waveforms are adjusted based on rewards using the tug-of-war method. Subsequently, the slot machine that yields the maximum-amplitude chaotic temporal waveform with bias is selected. The scaling properties of the correct decision-making process are examined by increasing the number of slot machines to 1024, and the scaling exponent of the power-law distribution is 0.97. We demonstrate that the proposed method outperforms existing software algorithms in terms of the scaling exponent. This result paves the way for photonic decision making in large-scale multi-armed bandit problems using photonic accelerators.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal