Journal of Language Modelling (Dec 2012)

Slovak Morphosyntactic Tagset

  • Radovan Garabík,
  • Mária Šimková

DOI
https://doi.org/10.15398/jlm.v0i1.35
Journal volume & issue
no. 1

Abstract

Read online

Morphological annotation constitutes essential, very useful and very common linguistic information presented in corpora, especially for highly inflectional languages. The morphological tagset used in the Slovak National Corpus has been designed with several goals in mind – the tags are compact and easily human-readable, without sacrificing their informational contents. The tags consist of ASCII letters, numbers and several other characters. In general, they have a variable number of symbols, but their order is obligatory, and each category or specific feature is assigned a particular character, which can be shared among several parts of speech. The tagset is highly functional and pragmatic, although some allowances had to be made to accommodate traditional analysis of Slovak morphology and part of speech categories. In particular, function words are classified according to their syntactic (and semantic) roles, which is a reason why the tagset is sometimes described as a morphosyntactic one.

Keywords