LingVaria (May 2023)

Korpus XIX w. Uniwersytetu Warszawskiego i IJP PAN

  • Marek Łaziński,
  • Rafał L. Górski,
  • Michał Woźniak

DOI
https://doi.org/10.12797/LV.18.2023.35.09
Journal volume & issue
Vol. 18, no. 1(35)

Abstract

Read online

CORPUS OF THE 19TH CENTURY OF THE WARSAW UNIVERSITY AND IJP PAN The article describes a historical corpus which documents the 19th and early 20th century. The corpus was created as part of a research grant whose objective was to investigate the development of the aspectual system of Polish in the last 250 years against the background of Czech and Russian. An important resource for this investigation was a database of aspectual triplets, which, in turn, was based on materials such as text corpora. Since there was no large corpus of the 19th and early 20th century available, there was a need to bridge this gap. In the course of the project, such corpus was made and it is now publicly accessible with no restrictions. This comprehensive corpus contains over 12 million contemporary words. Its texts originate from major Polish virtual libraries. It is POS-tagged with a tagger dedicated for 19th century texts. A web-based concordancer, an adjusted version of ParaVoz, allows for querying the corpus. The queries may be constrained by metadata.

Keywords