Frontiers in Applied Mathematics and Statistics (Jul 2018)
Performance in Multi-Armed Bandit Tasks in Relation to Ambiguity-Preference Within a Learning Algorithm
Abstract
Ellsberg paradox in decision theory posits that people will inevitably choose a known probability of winning over an unknown probability of winning even if the known probability is low [1]. One of the prevailing theories that addresses the Ellsberg paradox is known as “ambiguity-aversion.” In this study, we investigated the properties of ambiguity-aversion in four distinct types of reinforcement learning algorithms: ucb1-tuned [2], modified ucb1-tuned, softmax [3], and tug-of-war [4, 5]. We took the following scenario as our sample, in which there were two slot machines and each machine dispenses a coin according to a probability that is generated by its own probability density function (PDF). We then investigated the choices of a learning algorithm in such multi-armed bandit tasks. There were different reactions in multi-armed bandit tasks, depending on the ambiguity-preference in the learning algorithms. Notably, we discovered a clear performance enhancement related to ambiguity-preference in a learning algorithm. Although this study does not directly address the issue of ambiguity-aversion theory highlighted in Ellsberg paradox, the differences among different learning algorithms suggest that there is room for further study regarding the Ellsberg paradox and the decision theory.
Keywords