Northern European Journal of Language Technology (Sep 2013)

Stagger: an Open-Source Part of Speech Tagger for Swedish

  • Robert Östling

DOI
https://doi.org/10.3384/nejlt.2000-1533.1331
Journal volume & issue
Vol. 3

Abstract

Read online

This work presents Stagger, a new open-source part of speech tagger for Swedish based on the Averaged Perceptron. By using the SALDO morphological lexicon and semi-supervised learning in the form of Collobert andWeston embeddings, it reaches an accuracy of 96.4% on the standard Stockholm-Umeå Corpus dataset, making it the best single part of speech tagging system reported for Swedish. Accuracy increases to 96.6% on the latest version of the corpus, where the annotation has been revised to increase consistency. Stagger is also evaluated on a new corpus of Swedish blog posts, investigating its out-of-domain performance.