Journal of Open Humanities Data (Jul 2021)

The Game Walkthrough Corpus (GWTC) – A Resource for the Analysis of Textual Game Descriptions

  • Manuel Burghardt,
  • Jochen Tiepmar

DOI
https://doi.org/10.5334/johd.34
Journal volume & issue
Vol. 7

Abstract

Read online

We present the Game Walkthrough Corpus (GWTC), which contains 12,295 unique walkthrough documents covering 6,117 games. For each game walkthrough, we provide frequencies of unigrams and bigrams, treating the walkthrough document as a Bag of Words. In addition, we provide word frequencies at the sentence level. Furthermore, the GWTC contains a number of game-related metadata, including title, publisher, developer, year, and genre. All the language statistics and metadata are stored in separate plain text files and can be referenced through uniform resource names (URN). These URNs can also be used to derive any combination of statistics and metadata. Researchers, for instance, can investigate the most frequent unigrams for games in the “Adventure” genre. This way, the GWTC can be reused for different kinds of research questions on gaming language.

Keywords