IJCoL (Jun 2016)
ISACCO: a corpus for investigating spoken and written language development in Italian school–age children
Abstract
In this paper we present ISACCO (Italian School–Age Children COrpus), a corpus of oral and written retellings of Italian-speaking children attending primary school. All texts were digitalized and automatically enriched with multi–level linguistic annotation. Preliminary explorations of both the form and the content of children’s productions were carried out based on a set of features automatically extracted by NLP tools. Written retellings were manually annotated with a typology of errors belonging to three different linguistic levels. The resource, which has been made publicly available, is conceived to support research and computational modeling of “later language acquisition”, with an emphasis on comparative assessment of the evolution of oral and written language competencies in early school grades.