PLoS ONE (Jan 2019)
Spelling performance on the web and in the lab.
Abstract
Several dictionary websites are available on the web to access semantic, synonymous, or spelling information about a given word. During nine years, we systematically recorded all the entered letter sequences from a French web dictionary. A total of 200 million orthographic forms were obtained allowing us to create a large-scale database of spelling errors that could inform psychological theories about spelling processes. To check the reliability of this big data methodology, we selected from this database a sample of 100 frequently misspelled words. A group of 100 French university students had to perform a spelling-to-dictation test on this list of words. The results showed a strong correlation between the two data sets on the frequencies of produced spellings (r = 0.82). Although the distributions of spelling errors were relatively consistent across the two databases, the proportion of correct responses revealed significant differences. Regression analyses allowed us to generate possible explanations for these differences in terms of task-dependent factors. We argue that comparing the results of these large-scale databases with those of standard and controlled experimental paradigms is certainly a good way to determine the conditions under which this big data methodology can be adequately used for informing psychological theories.