Computers (Jun 2024)
An NLP-Based Exploration of Variance in Student Writing and Syntax: Implications for Automated Writing Evaluation
Abstract
In writing assessment, expert human evaluators ideally judge individual essays with attention to variance among writers’ syntactic patterns. There are many ways to compose text successfully or less successfully. For automated writing evaluation (AWE) systems to provide accurate assessment and relevant feedback, they must be able to consider similar kinds of variance. The current study employed natural language processing (NLP) to explore variance in syntactic complexity and sophistication across clusters characterized in a large corpus (n = 36,207) of middle school and high school argumentative essays. Using NLP tools, k-means clustering, and discriminant function analysis (DFA), we observed that student writers employed four distinct syntactic patterns: (1) familiar and descriptive language, (2) consistently simple noun phrases, (3) variably complex noun phrases, and (4) moderate complexity with less familiar language. Importantly, each pattern spanned the full range of writing quality; there were no syntactic patterns consistently evaluated as “good” or “bad”. These findings support the need for nuanced approaches in automated writing assessment while informing ways that AWE can participate in that process. Future AWE research can and should explore similar variability across other detectable elements of writing (e.g., vocabulary, cohesion, discursive cues, and sentiment) via diverse modeling methods.
Keywords