CogniTextes (Jun 2019)

How large should a dense corpus be for reliable studies in early language acquisition ?

  • Christophe Parisse

DOI
https://doi.org/10.4000/cognitextes.1483
Journal volume & issue
Vol. 19

Abstract

Read online

Dense corpora have been put forward as necessary tools for corpus studies of language acquisition. Despite their great interest, they are not yet frequently used, probably because of the high cost involved in their creation. The goal of the present study was to predict the optimal size of a dense longitudinal corpus when used to infer, manually or automatically, the details of lexical or syntactic development in child language. The results show that corpora of at least 30 to 40 one-hour recordings are necessary, but that longer corpora using the same protocol provide little new information. Dense corpora are indeed very useful, but do not need to be overly large to study grammatical development. This has important consequences for corpus-building projects, which can be optimized. The existence of a limit to the amount of information provided by large corpora also has important consequences for linguistic theory, as this helps locate the threshold between learning frozen forms and generalizing knowledge about language structure.

Keywords