Gragoatá (Jun 2015)

AN ANNOTATED CORPUS WITH SUPPORT VERB CONSTRUCTIONS IN PORTUGUESE

  • Amanda Pontes Rassi,
  • Jorge Baptista,
  • Oto Araújo Vale

Journal volume & issue
Vol. 20, no. 38

Abstract

Read online

The support verb constructions (SVC) are a type of nominal construction, where the core predicate is the noun, called 'predicative noun' (Npred), which is assisted by a verb, called 'support verb' (Vsup). The Lexicon‑Grammar theoretical and methodological framework was adopted, in this paper, for the linguistic description and formalization of SVC in Portuguese. Considering the syntactic and semantic differences between SVC and other types of constructions, the purpose of this paper is to present the methodology and results of creating a corpus annotated with Vsup and Npred. A list with 4,668 SVC was built, considering 45 variants of Vsup and around 3,200 different Npred. Based on this list, we extracted 121,198 sentences from PLN.Br full corpus, from which 2,646 sentences have been manually annotated. This sample may constitute a reference corpus for the processing of SVC and used as a golden standard for evaluating the automatic tasks of identification, extraction or classification of SVC, as well as for other Natural Language Processing (NLP) applications.

Keywords