Calidoscópio (May 2010)
Corpus compilation: Representativeness and the CORPOBRAS
Abstract
This paper discusses an important parameter in corpus design and compilation: representativeness. This parameter is related to the need to include in corpora texts that represent several uses of the language so that comprehensive descriptions can be developed. The paper also presents a corpus of Brazilian Portuguese – CORPOBRAS – that comprises 27 discourse genres and is guided by the representativeness parameter. The paper finally lists several corpus-based studies that draw upon CORPOBRAS data. Key words: CORPOBRAS, corpus linguistics, genre variation, representativeness, oral and written discourse.