Machine Translation and Gender biases in video game localisation: a corpus-based analysis

María Rivas Ginel; Sarah Theroine

doi:10.46298/jdmdh.9065

Journal of Data Mining and Digital Humanities (Dec 2022)

Machine Translation and Gender biases in video game localisation: a corpus-based analysis

María Rivas Ginel,
Sarah Theroine

Affiliations

María Rivas Ginel: Centre Interlangues : texte, image, langage [Dijon]
Sarah Theroine: ORCiD; Centre Interlangues : texte, image, langage [Dijon]

DOI: https://doi.org/10.46298/jdmdh.9065
Journal volume & issue: Vol. Towards robotic translation?, no. V. The contribution of...

Abstract

Read online

The video game industry has been a historically gender-biased terrain due to a higher number of male protagonists and hypersexualised representations [Dietz, 1998; Downs & Smith, 2010; Lynch et al., 2016]. Nowadays, echoing the debate on inclusive language, companies attempt to erase gender disparity by introducing main female characters as well as non-binary characters. From a technological point of view, even though recent studies show that Machine Translation remains largely unadopted by individual video game localisers [Rivas Ginel, 2021], multilanguage vendors are willing to invest in these tools to reduce costs [LIND, 2020]. However, the predominance of the masculine in Natural Language Processing and Machine Learning has created allocation and representation biases in Neural Machine Translation [Crawford, 2017].This paper aims to analyse the percentage of gender bias resulting from the use of Google Translate, DeepL, and SmartCat when translating in-game raw content from English into French. The games DeltaRune, The Devil's Womb and The Faces of the Forest were chosen due to the presence of non-binary characters, non-sexualized characters, and female protagonists. We compared the results in order to recount and analyse the differences between these tools' output when in terms of errors related to gender. To this end, we created a parallel corpus to compare source documents and all the translations to visualise the semantic and grammatical directions of the words embeddings [Zhou; Shi; Zhao and al, 2019] and extracted the collocations and concordance lines that represented gender identity by analysing the patterns in the source language.

Published in Journal of Data Mining and Digital Humanities

ISSN: 2416-5999 (Online)
Publisher: Nicolas Turenne
Country of publisher: France
LCC subjects: General Works: History of scholarship and learning. The humanities; Bibliography. Library science. Information resources
Website: http://jdmdh.episciences.org/

About the journal

Abstract

Keywords