IJCoL (Dec 2023)
The Kolipsi Corpus Family: Resources for Learner Corpus Research in Italian and German
Abstract
This article describes the Kolipsi Corpus Family (KCF), a collection of eight related resources for learner corpus research in German and Italian. The KCF supports the study of second language (L2) acquisition of Italian and German in upper secondary schools. It subsumes four L2 corpora with comparable corpus design (with respect to data collection, writing tasks, additional metadata, annotation and processing), portraying two homogeneous learner groups and their learner varieties. The corpora are representative of language learners in the multilingual Italian province of South Tyrol, where both languages are taught daily. The L2 corpora were collected at two different points in time, in 2007 (Kolipsi-1) and 2014 (Kolipsi-2), and all texts were labeled with CEFR levels to allow comparisons of proficiency levels across time. L2 German texts were collected in schools with Italian as the main language of instruction, whereas L2 Italian texts were collected in schools with German as the main language of instruction. Additional resources within the KCF allow researchers to compare the students’ language competences in their L2 with the language competences in their first language (L1) in a different task (Kolipsi-Matura) and with similarly aged L1 writers performing the same task (Kolipsi-1-L1). All texts are freely available to the scientific community. Access to the data is granted via an ANNIS search interface and via the Eurac Research CLARIN Repository, from which corpus data can be downloaded in various formats.