Performance evaluation of automated scoring for the descriptive similarity response task

Ryunosuke Oka; Takashi Kusumi; Akira Utsumi

doi:10.1038/s41598-024-56743-6

Scientific Reports (Mar 2024)

Performance evaluation of automated scoring for the descriptive similarity response task

Ryunosuke Oka,
Takashi Kusumi,
Akira Utsumi

Affiliations

Ryunosuke Oka: Mitsubishi Electric Corporation
Takashi Kusumi: Graduate School of Education, Kyoto University
Akira Utsumi: Graduate School of Informatics and Engineering, University of Electro-Communications

DOI: https://doi.org/10.1038/s41598-024-56743-6
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract We examined whether a machine-learning-based automated scoring system can mimic the human similarity task performance. We trained a bidirectional encoder representations from transformer-model based on the semantic similarity test (SST), which presented participants with a word pair and asked them to write about how the two concepts were similar. In Experiment 1, based on the fivefold cross validation, we showed the model trained on the combination of the responses (N = 1600) and classification criteria (which is the rubric of the SST; N = 616) scored the correct labels with 83% accuracy. In Experiment 2, using the test data obtained from different participants in different timing from Experiment 1, we showed the models trained on the responses alone and the combination of responses and classification criteria scored the correct labels in 80% accuracy. In addition, human–model scoring showed inter-rater reliability of 0.63, which was almost the same as that of human–human scoring (0.67 to 0.72). These results suggest that the machine learning model can reach human-level performance in scoring the Japanese version of the SST.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal