Unsupervised Text Segmentation Predicts Eye Fixations During Reading

Jinbiao Yang; Jinbiao Yang; Antal van den Bosch; Stefan L. Frank

doi:10.3389/frai.2022.731615

Frontiers in Artificial Intelligence (Feb 2022)

Unsupervised Text Segmentation Predicts Eye Fixations During Reading

Jinbiao Yang,
Jinbiao Yang,
Antal van den Bosch,
Stefan L. Frank

Affiliations

Jinbiao Yang: Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands
Jinbiao Yang: Centre for Language Studies, Radboud University, Nijmegen, Netherlands
Antal van den Bosch: KNAW Meertens Institute, Amsterdam, Netherlands
Stefan L. Frank: Centre for Language Studies, Radboud University, Nijmegen, Netherlands

DOI: https://doi.org/10.3389/frai.2022.731615
Journal volume & issue: Vol. 5

Abstract

Read online

Words typically form the basis of psycholinguistic and computational linguistic studies about sentence processing. However, recent evidence shows the basic units during reading, i.e., the items in the mental lexicon, are not always words, but could also be sub-word and supra-word units. To recognize these units, human readers require a cognitive mechanism to learn and detect them. In this paper, we assume eye fixations during reading reveal the locations of the cognitive units, and that the cognitive units are analogous with the text units discovered by unsupervised segmentation models. We predict eye fixations by model-segmented units on both English and Dutch text. The results show the model-segmented units predict eye fixations better than word units. This finding suggests that the predictive performance of model-segmented units indicates their plausibility as cognitive units. The Less-is-Better (LiB) model, which finds the units that minimize both long-term and working memory load, offers advantages both in terms of prediction score and efficiency among alternative models. Our results also suggest that modeling the least-effort principle for the management of long-term and working memory can lead to inferring cognitive units. Overall, the study supports the theory that the mental lexicon stores not only words but also smaller and larger units, suggests that fixation locations during reading depend on these units, and shows that unsupervised segmentation models can discover these units.

Published in Frontiers in Artificial Intelligence

ISSN: 2624-8212 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.frontiersin.org/journals/artificial-intelligence#

About the journal

Abstract

Keywords