Ibérica (Jul 2020)

Can comparable corpora be compared?

  • Belén López Arroyo

Journal volume & issue
Vol. 39
pp. 43 – 68

Abstract

Read online

While there is consensus on the definition of a comparable corpus, there is little or no agreement on what makes a corpus comparable or how to assess comparability. A comparable corpus consists of two or more collections of texts (subcorpora) in different languages or different language varieties, which are similar in some way. But in what way? According to McEnery and Xiao (2007: 20), proportion, genre, domain, and time constitute the main criteria when compiling a comparable corpus and must match in the different languages for the corpus to be considered comparable. However, in previous studies (LópezArroyo & Roberts, 2017), it has been shown by the analysis of two specialized comparable corpora that these criteria work well for certain fields, but not all. In the present study, we examine comparability from the point of view of the purpose for which a comparable corpus is to be used. In order to do that we have compiled a comparable corpus of 150 tasting notes in English and Spanish written by two experts in the field in Spain and in usa and published in the same decades; according to McEnery and Xiao (2007) our corpora meet all the criteria to be comparable. However, our methodology focused on the analysis of aspects such as content, format and style of the genre under study for the comparability of corpora will prove that proportion, genre, domain, time and size are not valid enough for comparing comparable corpora.

Keywords