Data-Driven Syllabification for Middle Dutch

Folgert Karsdorp; Mike Kestemont; Wouter Haverals

doi:10.16995/dm.83

Digital Medievalist (Nov 2019)

Data-Driven Syllabification for Middle Dutch

Folgert Karsdorp,
Mike Kestemont,
Wouter Haverals

Affiliations

Folgert Karsdorp
Mike Kestemont: University of Antwerp
Wouter Haverals

DOI: https://doi.org/10.16995/dm.83
Journal volume & issue: Vol. 12, no. 1

Abstract

Read online Read online

The task of automatically separating Middle Dutch words into syllables is a challenging one. A first method was presented by Bouma and Hermans (2012), who combined a rule-based finite-state component with data-driven error correction. Achieving an average word accuracy of 96.5%, their system surely is a satisfactory one, although it leaves room for improvement. Generally speaking, rule-based methods are less attractive for dealing with a medieval language like Middle Dutch, where not only each dialect has its own spelling preferences, but where there is also much idiosyncratic variation among scribes. This paper presents a different method for the task of automatically syllabifying Middle Dutch words, which does not rely on a set of pre-defined linguistic information. Using a Recurrent Neural Network (RNN) with Long-Short-Term Memory cells (LSTM), we obtain a system which outperforms the rule-based method both in robustness and in effort.

Published in Digital Medievalist

ISSN: 1715-0736 (Online)
Publisher: Open Library of Humanities
Country of publisher: United Kingdom
LCC subjects: History (General) and history of Europe: History (General): Medieval history
Website: https://journal.digitalmedievalist.org/

About the journal

Abstract

Keywords