Scientific Data (Sep 2023)

SC2EGSet: StarCraft II Esport Replay and Game-state Dataset

  • Andrzej Białecki,
  • Natalia Jakubowska,
  • Paweł Dobrowolski,
  • Piotr Białecki,
  • Leszek Krupiński,
  • Andrzej Szczap,
  • Robert Białecki,
  • Jan Gajewski

DOI
https://doi.org/10.1038/s41597-023-02510-7
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 12

Abstract

Read online

Abstract As a relatively new form of sport, esports offers unparalleled data availability. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and compared to laboratory-based measurements. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. We processed 55 “replaypacks” that contained 17930 files with game-state information. Our dataset is one of the few large publicly available sources of StarCraft II data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.