Eesti Rakenduslingvistika Ühingu Aastaraamat (May 2016)

Internetikeele automaatne süntaktiline analüüs kitsenduste grammatikaga

  • Dage Särg

DOI
https://doi.org/10.5128/ERYa12.15
Journal volume & issue
Vol. 12
pp. 253 – 267

Abstract

Read online

"Syntactic analysis of Estonian netspeak using Constraint Grammar" The paper provides an overview of an attempt to adapt the Estonian Constraint Grammar rule set for netspeak. The rule set has been developed by Kaili Müürisep and Tiina Puolakainen for shallow and dependency parsing of Estonian literary language, and it has previously been adapted for shallow parsing of spoken Estonian by Kaili Müürisep and Heli Uibo. First, in order to adapt the rules, a chatroom corpus was parsed with the existing rule set. The corpus was manually revised and based on the errors that were found, changes were made to the rule set. The changes regarded detection of clause boundaries and particle verbs, as well as assignment of syntactic tags and dependency relations. Extensive use of discourse particles and direct addresses, short sentence length, and small percentage of attributes among the syntactic functions used in text appeared to be the most distinctive features of netspeak, as well as the large amount of elliptical sentences from which, in addition to other syntactic functions, a predicate can be left out. As a result of adapting the rule set, the results of both shallow and dependency parsing improved. The most error-prone syntactic functions were subjects, predicatives, and adverbials. In dependency parsing, the largest number of errors was made in determining the governors of adverbials.

Keywords