Applied Sciences (Nov 2023)
Evaluating Familiarity Ratings of Domain Concepts with Interpretable Machine Learning: A Comparative Study
Abstract
Psycholinguistic properties such as concept familiarity and concreteness have been investigated in relation to technological innovations in teaching and learning. Due to ongoing advances in semantic representation and machine learning technologies, the automatic extrapolation of lexical psycholinguistic properties has received increased attention across a number of disciplines in recent years. However, little attention has been paid to the reliable and interpretable assessment of familiarity ratings for domain concepts. To address this gap, we present a regression model grounded in advanced natural language processing and interpretable machine learning techniques that can predict domain concepts’ familiarity ratings based on their lexical features. Each domain concept is represented at both the orthographic–phonological level and semantic level by means of pretrained word embedding models. Then, we compare the performance of six tree-based regression models (adaptive boosting, gradient boosting, extreme gradient boosting, a light gradient boosting machine, categorical boosting, and a random forest) on domain concepts’ familiarity rating prediction. Experimental results show that categorical boosting with the lowest MAPE (0.09) and the highest R2 value (0.02) is best suited to predicting domain concepts’ familiarity. Experimental results also revealed the prospect of integrating tree-based regression models and interpretable machine learning techniques to expand psycholinguistic resources. Specifically, findings showed that the semantic information of raw words and parts of speech in domain concepts are reliable indicators when predicting familiarity ratings. Our study underlines the importance of leveraging domain concepts’ familiarity ratings; future research should aim to improve familiarity extrapolation methods. Scholars should also investigate the correlation between students’ engagement in online discussions and their familiarity with domain concepts.
Keywords