On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

Klára Jágrová; Michael Hedderich; Michael Hedderich; Marius Mosbach; Marius Mosbach; Tania Avgustinova; Tania Avgustinova; Dietrich Klakow; Dietrich Klakow

doi:10.3389/fpsyg.2021.662277

Frontiers in Psychology (Jun 2021)

On the Correlation of Context-Aware Language Models With the Intelligibility of Polish Target Words to Czech Readers

Klára Jágrová,
Michael Hedderich,
Michael Hedderich,
Marius Mosbach,
Marius Mosbach,
Tania Avgustinova,
Tania Avgustinova,
Dietrich Klakow,
Dietrich Klakow

Affiliations

Klára Jágrová: Collaborative Research Center 1102: Information Density and Linguistic Encoding, Saarland University, Saarbrücken, Germany
Michael Hedderich: Collaborative Research Center 1102: Information Density and Linguistic Encoding, Saarland University, Saarbrücken, Germany
Michael Hedderich: Saarland Informatics Campus, Spoken Language Systems, Saarland University, Saarbrücken, Germany
Marius Mosbach: Collaborative Research Center 1102: Information Density and Linguistic Encoding, Saarland University, Saarbrücken, Germany
Marius Mosbach: Saarland Informatics Campus, Spoken Language Systems, Saarland University, Saarbrücken, Germany
Tania Avgustinova: Collaborative Research Center 1102: Information Density and Linguistic Encoding, Saarland University, Saarbrücken, Germany
Tania Avgustinova: Language Science and Technology, Saarland University, Saarbrücken, Germany
Dietrich Klakow: Collaborative Research Center 1102: Information Density and Linguistic Encoding, Saarland University, Saarbrücken, Germany
Dietrich Klakow: Saarland Informatics Campus, Spoken Language Systems, Saarland University, Saarbrücken, Germany

DOI: https://doi.org/10.3389/fpsyg.2021.662277
Journal volume & issue: Vol. 12

Abstract

Read online

This contribution seeks to provide a rational probabilistic explanation for the intelligibility of words in a genetically related language that is unknown to the reader, a phenomenon referred to as intercomprehension. In this research domain, linguistic distance, among other factors, was proved to correlate well with the mutual intelligibility of individual words. However, the role of context for the intelligibility of target words in sentences was subject to very few studies. To address this, we analyze data from web-based experiments in which Czech (CS) respondents were asked to translate highly predictable target words at the final position of Polish sentences. We compare correlations of target word intelligibility with data from 3-g language models (LMs) to their correlations with data obtained from context-aware LMs. More specifically, we evaluate two context-aware LM architectures: Long Short-Term Memory (LSTMs) that can, theoretically, take infinitely long-distance dependencies into account and Transformer-based LMs which can access the whole input sequence at the same time. We investigate how their use of context affects surprisal and its correlation with intelligibility.

Published in Frontiers in Psychology

ISSN: 1664-1078 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Philosophy. Psychology. Religion: Psychology
Website: https://www.frontiersin.org/journals/psychology

About the journal

Abstract

Keywords