CogniTextes (Jun 2019)

The importance of sampling frames in representative historical corpora : a case study of Parisian theater

  • Angus B. Grieve-Smith

DOI
https://doi.org/10.4000/cognitextes.1671
Journal volume & issue
Vol. 19

Abstract

Read online

Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.

Keywords