Zeitschrift für Sprachwissenschaft (Nov 2021)

Towards a broad-coverage graphemic analysis of large historical corpora

  • Waldenberger Sandra,
  • Dipper Stefanie,
  • Lemke Ilka

DOI
https://doi.org/10.1515/zfs-2021-2037
Journal volume & issue
Vol. 40, no. 3
pp. 401 – 420

Abstract

Read online

This paper presents a method which we are developing to explore graphemic variation in large historical corpora of German. Historical corpora provide an amount of data at the level of graphemics which cannot be handled exhaustively using common methods of manual evaluation. To deal with this challenge, we apply methods from computational linguistics to pave the way for a broad-coverage graph(em)ic analysis of large historical corpora. In this paper, we show how our approach can be applied to the Reference Corpus of Middle High German. Illustrating our method and linguistic analysis, we present findings from our investigations into diatopic and/or diachronic variation as documented in 13th and 14th century charters (Urkunden) from the corpus.

Keywords