The Brazilian Portuguese Lexicon: An Instrument for Psycholinguistic Research.

PLoS ONE. 2015;10(12):e0144016 DOI 10.1371/journal.pone.0144016

 

Journal Homepage

Journal Title: PLoS ONE

ISSN: 1932-6203 (Online)

Publisher: Public Library of Science (PLoS)

LCC Subject Category: Medicine | Science

Country of publisher: United States

Language of fulltext: English

Full-text formats available: PDF, HTML, XML

 

AUTHORS

Gustavo L Estivalet
Fanny Meunier

EDITORIAL INFORMATION

Peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 24 weeks

 

Abstract | Full Text

In this article, we present the Brazilian Portuguese Lexicon, a new word-based corpus for psycholinguistic and computational linguistic research in Brazilian Portuguese. We describe the corpus development, the specific characteristics on the internet site and database for user access. We also perform distributional analyses of the corpus and comparisons to other current databases. Our main objective was to provide a large, reliable, and useful word-based corpus with a dynamic, easy-to-use, and intuitive interface with free internet access for word and word-criteria searches. We used the NĂșcleo Interinstitucional de LinguĂ­stica Computacional's corpus as the basic data source and developed the Brazilian Portuguese Lexicon by deriving and adding metalinguistic and psycholinguistic information about Brazilian Portuguese words. We obtained a final corpus with more than 30 million word tokens, 215 thousand word types and 25 categories of information about each word. This corpus was made available on the internet via a free-access site with two search engines: a simple search and a complex search. The simple engine basically searches for a list of words, while the complex engine accepts all types of criteria in the corpus categories. The output result presents all entries found in the corpus with the criteria specified in the input search and can be downloaded as a.csv file. We created a module in the results that delivers basic statistics about each search. The Brazilian Portuguese Lexicon also provides a pseudoword engine and specific tools for linguistic and statistical analysis. Therefore, the Brazilian Portuguese Lexicon is a convenient instrument for stimulus search, selection, control, and manipulation in psycholinguistic experiments, as also it is a powerful database for computational linguistics research and language modeling related to lexicon distribution, functioning, and behavior.