Populating a multilingual ontology of proper names from open sources

Agata Savary; Leszek Manicki; Małgorzata Baron

doi:10.15398/jlm.v1i2.63

Journal of Language Modelling (Nov 2013)

Populating a multilingual ontology of proper names from open sources

Agata Savary,
Leszek Manicki,
Małgorzata Baron

Affiliations

Agata Savary: Université François Rabelais Tours
Leszek Manicki: Institute of Computer Science, Polish Academy of Sciences
Małgorzata Baron: Institute of Computer Science, Polish Academy of Sciences

DOI: https://doi.org/10.15398/jlm.v1i2.63
Journal volume & issue: Vol. 1, no. 2

Abstract

Read online

Even if proper names play a central role in natural language processing (NLP) applications they are still under-represented in lexicons, annotated corpora, and other resources dedicated to text processing. One of the main challenges is both the prevalence and the dynamicity of proper names. At the same time, large and regularly-updated knowledge sources containing partially-structured data, such as Wikipedia or GeoNames, are publicly available and contain large numbers of proper names. We present a method for a semi-automatic enrichment of Prolexbase, an existing multilingual ontology of proper names dedicated to natural language processing, with data extracted from these open sources in three languages: Polish, English and French. Fine-grained data extraction and integration procedures allow the user to enrich previous contents of Prolexbase with new incoming data. All data are manually validated and available under an open licence.

Published in Journal of Language Modelling

ISSN: 2299-856X (Print); 2299-8470 (Online)
Publisher: Institute of Computer Science, Polish Academy of Sciences
Country of publisher: Poland
LCC subjects: Language and Literature: Philology. Linguistics
Website: http://jlm.ipipan.waw.pl/

About the journal

Abstract

Keywords