RIDE (Sep 2017)

Deutsches Textarchiv

  • Dario Kampkaspar

DOI
https://doi.org/10.18716/ride.a.6.8
Journal volume & issue
Vol. 6

Abstract

Read online

Owing to its well documented TEI subset and highly accurate transcriptions usually based on the first edition of a text, the German Textarchive (= Deutsches Textarchiv, DTA) is currently one of the best corpora for historical German texts (1600-1900), albeit not necessarily the most extensive. To date (August 2017) it contains around 3260 works. The full texts of the corpus are enhanced by digital facsimiles and encoded with regard to their visual features (e.g. layout, fonts) as well as annotated linguistically (e.g. PoS-tagging). The search engine focuses on linguistic features and allows for searching both exact spellings as well as spelling variants of a word. Due to its goal of providing a corpus for historical linguistics and the underlying selection criteria of the collection, no further comments or variants in other editions of one work than the first print are given. Even though there are minor points of criticism, due to the quality of its textual sources and the accurate documentation the DTA can be seen as a point of reference for other corpora.

Keywords