IEEE Access (Jan 2024)
Predicting First-Language and Second-Language Proficiency Using Eye Fixation Data and Demographic Information: Assumptions, Data Representations, and Methods
Abstract
Studying first-language (L1), second-language (L2) acquisition, and bilingualism using eye movement data has become a popular topic in psycholinguistic and educational research communities. The current research uses eye fixation data along with demographic information, to investigate the five research questions (RQ) as follows. $Q_{1}$ Is it possible to predict L1 from the eye fixation data using artificial intelligence (AI) methods? $Q_{2}$ Is it possible to predict second-language proficiency (L2P) from eye-fixation data using AI methods? $Q_{3}$ Which of the six L2P assessment batteries under consideration is more effective in predicting L2P? $Q_{4}$ How informative is eye fixation data or its combination with demographic information in predicting L1 and L2P? $Q_{5}$ How can eye fixation data be represented for training AI models in predicting L1 and L2P? We used the MECO L2 data set and scrutinized the performance of three families of AI methods. In respect to each RQ the results showed that 1) using only eye fixation data, it is possible to predict L1 with a ROC-AUC equal to 0.755; 2) using only eye fixation data, it is not possible to predict L2P accurately (since a $R^{2}$ -score equal to 0.216 was obtained); 3) L2 Lexical Skills is the most effective L2P assessment battery; 4) combining the eye-fixation data with demographic features led to a significant improvement in the performance of the models, i.e., a ROC-AUC equal to 0.997 in predicting L1 and a $R^{2}$ -score equal to 0.899 in predicting L2P were obtained, and simultaneously downgraded the impacts of eye-fixation parameters; 5) the 2D-scatter plot images can be considered an appropriate candidate for training AI models using only eye-fixation data –at least for predicting L1.
Keywords