The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals

Matt Erlin; Andrew Piper; Douglas Knox; Stephen Pentecost; Allie Blank

doi:10.5334/johd.94

Journal of Open Humanities Data (Dec 2022)

The TRANSCOMP Dataset of Literary Translations from 120 Languages and a Parallel Collection of English-language Originals

Matt Erlin,
Andrew Piper,
Douglas Knox,
Stephen Pentecost,
Allie Blank

Affiliations

Matt Erlin: Germanic Languages and Literatures, Washington University, St. Louis
Andrew Piper: Languages, Literatures, and Cultures, McGill University, Montreal
Douglas Knox: Humanities Digital Workshop, Washington University, St. Louis
Stephen Pentecost: Humanities Digital Workshop, Washington University, St. Louis
Allie Blank: Humanities Digital Workshop, Washington University, St. Louis

DOI: https://doi.org/10.5334/johd.94
Journal volume & issue: Vol. 8

Abstract

Read online

The TRANSCOMP Dataset of Literary Translations is a collection of document-level word frequencies sampled from 10,631 translations into English of global literary fiction published since 1950, together with a historically matched parallel corpus of 10,682 fictional works originally published in English. We provide CSV files with word frequency counts for 10,000-word samples taken from each text. The associated metadata is available in a separate CSV. These data will be useful to literary scholars and linguists working in translation studies, and those interested in the linguistic, stylistic, and thematic specificity of translations from particular regions.

Published in Journal of Open Humanities Data

ISSN: 2059-481X (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: General Works: History of scholarship and learning. The humanities; Language and Literature
Website: https://openhumanitiesdata.metajnl.com/

About the journal

Abstract

Keywords