Information (Mar 2025)

Comparison of Grammar Characteristics of Human-Written Corpora and Machine-Generated Texts Using a Novel Rule-Based Parser

  • Simon Strübbe,
  • Irina Sidorenko,
  • Renée Lampe

DOI
https://doi.org/10.3390/info16040274
Journal volume & issue
Vol. 16, no. 4
p. 274

Abstract

Read online

As the prevalence of machine-written texts grows, it has become increasingly important to distinguish between human- and machine-generated content, especially when such texts are not explicitly labeled. Current artificial intelligence (AI) detection methods primarily focus on human-like characteristics, such as emotionality and subjectivity. However, these features can be easily modified through AI humanization, which involves altering word choice. In contrast, altering the underlying grammar without affecting the conveyed information is considerably more challenging. Thus, the grammatical characteristics of a text can be used as additional indicators of its origin. To address this, we employ a newly developed rule-based parser to analyze the grammatical structures in human- and machine-written texts. Our findings reveal systematic grammatical differences between human- and machine-written texts, providing a reliable criterion for the determination of the text origin. We further examine the stability of this criterion in the context of AI humanization and translation to other languages.

Keywords