Predicting First-Language and Second-Language Proficiency Using Eye Fixation Data and Demographic Information: Assumptions, Data Representations, and Methods

Soroosh Shalileh; Matvey Kairov; Ranga Baminiwatte; Olga Parshina; Olga Dragoy

doi:10.1109/ACCESS.2024.3468460

IEEE Access (Jan 2024)

Predicting First-Language and Second-Language Proficiency Using Eye Fixation Data and Demographic Information: Assumptions, Data Representations, and Methods

Soroosh Shalileh,
Matvey Kairov,
Ranga Baminiwatte,
Olga Parshina,
Olga Dragoy

Affiliations

Soroosh Shalileh: ORCiD; Center for Language and Brain, HSE University, Moscow, Russia
Matvey Kairov: ORCiD; Laboratory of Artificial Intelligence for Cognitive Sciences, HSE University, Moscow, Russia
Ranga Baminiwatte: ORCiD; School of Computing, Clemson University, Clemson, SC, USA
Olga Parshina: Psychology Department, Middlebury College, Middlebury, VT, USA
Olga Dragoy: Center for Language and Brain, HSE University, Moscow, Russia

DOI: https://doi.org/10.1109/ACCESS.2024.3468460
Journal volume & issue: Vol. 12
pp. 145832 – 145844

Abstract

Read online

Studying first-language (L1), second-language (L2) acquisition, and bilingualism using eye movement data has become a popular topic in psycholinguistic and educational research communities. The current research uses eye fixation data along with demographic information, to investigate the five research questions (RQ) as follows. $Q_{1}$ Is it possible to predict L1 from the eye fixation data using artificial intelligence (AI) methods? $Q_{2}$ Is it possible to predict second-language proficiency (L2P) from eye-fixation data using AI methods? $Q_{3}$ Which of the six L2P assessment batteries under consideration is more effective in predicting L2P? $Q_{4}$ How informative is eye fixation data or its combination with demographic information in predicting L1 and L2P? $Q_{5}$ How can eye fixation data be represented for training AI models in predicting L1 and L2P? We used the MECO L2 data set and scrutinized the performance of three families of AI methods. In respect to each RQ the results showed that 1) using only eye fixation data, it is possible to predict L1 with a ROC-AUC equal to 0.755; 2) using only eye fixation data, it is not possible to predict L2P accurately (since a $R^{2}$ -score equal to 0.216 was obtained); 3) L2 Lexical Skills is the most effective L2P assessment battery; 4) combining the eye-fixation data with demographic features led to a significant improvement in the performance of the models, i.e., a ROC-AUC equal to 0.997 in predicting L1 and a $R^{2}$ -score equal to 0.899 in predicting L2P were obtained, and simultaneously downgraded the impacts of eye-fixation parameters; 5) the 2D-scatter plot images can be considered an appropriate candidate for training AI models using only eye-fixation data –at least for predicting L1.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords