CogniTextes (Jun 2019)
The importance of sampling frames in representative historical corpora : a case study of Parisian theater
Abstract
Cognitive linguistics makes specific claims about language use, and corpora are our most powerful tool to test those claims. Representative sampling (Laplace 1814) is a technique that allows us to study smaller, more manageable corpora, and generalize our results to a broader sampling frame. For a sampled corpus to be relevant to our research questions, its sampling frame must have an understandable connection to the subject of our research question.In my dissertation study (Grieve-Smith 2009) I tested the type frequency hypothesis of analogical extension (Bybee 1995) using the FRANTEXT corpus (CNRTL 2018). In this study I test the theatrical texts in FRANTEXT from 1800-1815 against the new Digital Parisian Stage corpus, sampled from Wicks (1950 et seq.), a catalog of every play that premiered in Paris in the nineteenth century. Declarative sentence negations in the Digital Parisian Stage corpus occurred with ne … pas in 73.9 % of tokens, while in FRANTEXT they only occurred with ne … pas in 50.5 % of tokens. This shows that FRANTEXT is biased in favor of elite literary language. To properly test usage-based theories of language change we will need a representative corpus covering a century or more.
Keywords