Crowdsourcing ratings for single lexical items

Elena Volodina; David Alfter; Therese Lindström Tiedemann

doi:10.4312/slo2.0.2022.2.5-61

Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave (Dec 2022)

Crowdsourcing ratings for single lexical items

Elena Volodina,
David Alfter,
Therese Lindström Tiedemann

Affiliations

Elena Volodina: University of Gothenburg, Sweden
David Alfter: University of Gothenburg, Sweden; Université Catholique de Louvain, Belgium
Therese Lindström Tiedemann: University of Helsinki, Finland

DOI: https://doi.org/10.4312/slo2.0.2022.2.5-61
Journal volume & issue: Vol. 10, no. 2

Abstract

Read online

In this study, we investigate theoretical and practical issues connected to differentiating between core and peripheral vocabulary at different levels of linguistic proficiency using statistical approaches combined with crowdsourcing. We also investigate whether crowdsourcing second language learners’ rankings can be used for assigning levels to unseen vocabulary. The study is performed on Swedish single-word items. The four hypotheses we examine are: (1) there is core vocabulary for each proficiency level, but this is only true until CEFR level B2 (upper-intermediate); (2) core vocabulary shows more systematicity in its behavior and usage, whereas peripheral items have more idiosyncratic behavior; (3) given that we have truly core items (aka anchor items) for each level, we can place any new unseen item in relation to the identified core items by using a series of comparative judgment tasks, this way assigning a “target” level for a previously unseen item; and (4) non-experts will perform on par with experts in a comparative judgment setting. The hypotheses have been largely confirmed: In relation to (1) and (2), our results show that there seems to be some systematicity in core vocabulary for early to mid-levels (A1-B1) while we find less systematicity for higher levels (B2-C1). In relation to (3), we suggest crowdsourcing word rankings using comparative judgment with known anchor words as a method to assign a “target” level to unseen words. With regard to (4), we confirm the previous findings that non-experts, in our case language learners, can be effectively used for the linguistic annotation tasks in a comparative judgment setting.

Published in Slovenščina 2.0: Empirične, aplikativne in interdisciplinarne raziskave

ISSN: 2335-2736 (Online)
Publisher: University of Ljubljana Press (Založba Univerze v Ljubljani)
Country of publisher: Slovenia
LCC subjects: Language and Literature: Philology. Linguistics
Website: https://journals.uni-lj.si/slovenscina2

About the journal

Abstract

Keywords