F1000Research (Jul 2021)

Annotating clause boundary labels to the written composition corpus of Japanese elementary and junior high school students [version 1; peer review: 2 approved]

  • Mizuho Imada,
  • Takumi Tagawa,
  • Chang-Yun Moon,
  • Akio Nasu

DOI
https://doi.org/10.12688/f1000research.40669.1
Journal volume & issue
Vol. 10

Abstract

Read online

To evaluate the development of children’s writing ability, it is necessary not only to examine quantitative indices such as the dependency distance, but also to inquiry the types of structures they use. We conducted clause boundary labeling using Support Vector Machine (SVM) on a corpus of Japanese students' compositions to investigate the change in the tendency of clause use with the progression of school age. The analysis of clause label frequency per sentence exhibited an increase in attributive clauses, nominal clauses, quotation clauses, and continuous clauses, and a decrease in parallel clauses, conditional clauses, reason clauses, time clauses, indirect interrogative clauses, and main clauses. The analysis of dependency distance demonstrated that most of the clauses that increased had short dependency distances, while most of the clauses that decreased had long dependency distances, and that the frequency of clauses with small dependency distances increased relatively with increasing school age. In addition, there was a shift in clause selection among functionally similar clauses, such as from “-te” to continuous forms, from “-tara” to “-ba”, and from “-kedo” and “-keredo” to “-ga”. These results suggest a change in the children’s lexical and grammatical choices, from coordinate to subordinate structures, and from spoken to written vocabulary.