CogniTextes (Jun 2019)

Non-representativeness in corpora: perils, pitfalls and challenges

  • Thomas Egan

DOI
https://doi.org/10.4000/cognitextes.1772
Journal volume & issue
Vol. 19

Abstract

Read online

This article presents and discusses some problems of representativeness that the author has encountered in over twenty years of corpus-based research. It argues that the inclusion in a general corpus of certain text types, such as grammar treatises or works of historical fiction, can lessen the representativeness of the data, especially if the corpus is designed to reflect the linguistic production, as opposed to the linguistic reception, of a speech community. It is argued that less emphasis should be placed on reception in the compilation of general corpora. Also addressed are problems relating to the comparison of texts in different languages, as well as two solutions that have been proposed to counter these problems. The arguments are illustrated with examples from both contemporary and historical corpora.

Keywords