Cadernos de Estudos Lingüísticos (Dec 2017)

Annotating a polysynthetic language: From Portuguese to Kadiwéu

  • Charlotte Galves,
  • Filomena Sandalo,
  • Ticiana A. de Sena,
  • Luiz Veronesi

DOI
https://doi.org/10.20396/cel.v59i3.8651003
Journal volume & issue
Vol. 59, no. 3

Abstract

Read online

We propose for Kadiwéu, a polysynthetic language of Brazil, an extension of the POS annotation of the Tycho Brahe Annotated Corpus of Historical Portuguese (www.tycho.iel.unicamp.br/~tycho/corpus) – henceforth TBC, which consists in tagging both words and morphemes, yielding a two-level annotation. The tagging of words is necessary to generate the syntactic parsing that is missing from the current corpuses of Brazilian native languages. The morphological tagging is also crucial for polysynthetic languages since it allows searching for grammatical properties encoded by the morphemes. This is a pioneer proposal since it is the first time an American Indian language will be part of a Corpus allowing grammatical searches that include morphological and syntactic information.

Keywords