Natural Language Processing Journal (Sep 2024)
Automatic language ability assessment method based on natural language processing
Abstract
Background and Objectives:: The Wechsler Abbreviated Scales of Intelligence second edition (WASI-II) is a standardised assessment tool that is widely used to assess cognitive ability in clinical, research, and educational settings. In one of the components of this assessment, referred to as the Vocabulary task, the assessed individuals are presented with words (called stimulus items), and asked to explain what each word mean. Their responses are hand-scored based on a list of pre-rated sample responses [0-Point (poor), 1-Point (moderate), or 2-Point (excellent)] that is provided in the accompanying manual of WASI-II. This scoring method is time-consuming, and scoring of responses that do not fully match the pre-rated ones may vary between individual scorers. In this study, we aim to use natural language processing techniques to automate the scoring procedure and make it more time-efficient and reliable (objective). Methods:: Utilising five different word embeddings (Word2vec, Global Vectors, Bidirectional Encoder Representations from Transformers, Generative Pre-trained Transformer 2, and Embeddings from Language Model), we transformed stimulus items and pre-rated responses from the WASI-II Vocabulary task into machine-readable vectors. We measured distance with cosine similarity, evaluating each model against a rational-expectations hypothesis that vector representations for stimuli should align closely with 2-Point responses and diverge from 0-Point responses. Assessment involved frequency of consistent representation and the Pearson correlation coefficient, examining overall consistency with the manual’s ranking across all items and sample responses. Results:: The Word2vec model showed the highest consistency with the WASI-II manual (frequency = 20 out of 27; Pearson Correlation coefficient = 0.61) while Bidirectional Encoder Representations from Transformers was the worst performing model (frequency = 5; Pearson Correlation coefficient = 0.05). The consistency of these two models with the WASI-II manual differed significantly, Z = 2.282, p = 0.022. Conclusions:: Our results showed that the scoring of the WASI-II Vocabulary task can be automated with moderate accuracy relying upon off-the-shelf embedding models. These results are promising, and could be improved further by considering alternative vector dimensions, similarity metrics, and data preprocessing techniques to those used in this study.