Word prediction in computational historical linguistics

Peter Dekker; Willem Zuidema

doi:10.15398/jlm.v8i2.268

Journal of Language Modelling (Feb 2021)

Word prediction in computational historical linguistics

Peter Dekker,
Willem Zuidema

Affiliations

Peter Dekker: ORCiD; AI Lab, Vrije Universiteit Brussel
Willem Zuidema: ORCiD; Institute for Logic, Language and Computation, University of Amsterdam

DOI: https://doi.org/10.15398/jlm.v8i2.268
Journal volume & issue: Vol. 8, no. 2
pp. 295–336 – 295–336

Abstract

Read online

In this paper, we investigate how the prediction paradigm from machine learning and Natural Language Processing (NLP) can be put to use in computational historical linguistics. We propose word prediction as an intermediate task, where the forms of unseen words in some target language are predicted from the forms of the corresponding words in a source language. Word prediction allows us to develop algorithms for phylogenetic tree reconstruction, sound correspondence identification and cognate detection, in ways close to attested methods for linguistic reconstruction. We will discuss different factors, such as data representation and the choice of machine learning model, that have to be taken into account when applying prediction methods in historical linguistics. We present our own implementations and evaluate them on different tasks in historical linguistics.

Published in Journal of Language Modelling

ISSN: 2299-856X (Print); 2299-8470 (Online)
Publisher: Institute of Computer Science, Polish Academy of Sciences
Country of publisher: Poland
LCC subjects: Language and Literature: Philology. Linguistics
Website: http://jlm.ipipan.waw.pl/

About the journal

Abstract

Keywords