PLoS ONE (Jan 2024)
Utility of Kolmogorov complexity measures: Analysis of L2 groups and L1 backgrounds.
Abstract
The proliferation of automated syntactic complexity tools allowed the analysis of larger amounts of learner writing. However, existing tools tend to be language-specific or depend on segmenting learner production into native-based units of analysis. This study examined the utility of a language-general and unsupervised linguistic complexity metric: Kolmogorov complexity in discriminating between L2 proficiency levels within several languages (Czech, German, Italian, English) and across various L1 backgrounds (N = 10) using two large CEFR-rater learner corpora. Kolmogorov complexity was measured at three levels: syntax, morphology, and overall linguistic complexity. Pairwise comparisons indicated that all Kolmogorov complexity measures discriminated among the proficiency levels within the L2s. L1-based variation in complexity was also observed. Distinct syntactic and morphological complexity patterns were found when L2 English writings were analyzed across versus within L1 backgrounds. These results indicate that Kolmogorov complexity could serve as a valuable metric in L2 writing research due to its cross-linguistic flexibility and holistic nature.