Applied Sciences (Mar 2022)

Hybrid Training Strategies: Improving Performance of Temporal Difference Learning in Board Games

  • Jesús Fernández-Conde,
  • Pedro Cuenca-Jiménez,
  • José M. Cañas

DOI
https://doi.org/10.3390/app12062854
Journal volume & issue
Vol. 12, no. 6
p. 2854

Abstract

Read online

Temporal difference (TD) learning is a well-known approach for training automated players in board games with a limited number of potential states through autonomous play. Because of its directness, TD learning has become widespread, but certain critical difficulties must be solved in order for it to be effective. It is impractical to train an artificial intelligence (AI) agent against a random player since it takes millions of games for the agent to learn to play intelligently. Training the agent against a methodical player, on the other hand, is not an option owing to a lack of exploration. This article describes and examines a variety of hybrid training procedures for a TD-based automated player that combines randomness with specified plays in a predetermined ratio. We provide simulation results for the famous tic-tac-toe and Connect-4 board games, in which one of the studied training strategies significantly surpasses the other options. On average, it takes fewer than 100,000 games of training for an agent taught using this approach to act as a flawless player in tic-tac-toe.

Keywords