OGER++: hybrid multi-type entity recognition

Lenz Furrer; Anna Jancso; Nicola Colic; Fabio Rinaldi

doi:10.1186/s13321-018-0326-3

Journal of Cheminformatics (Jan 2019)

OGER++: hybrid multi-type entity recognition

Lenz Furrer,
Anna Jancso,
Nicola Colic,
Fabio Rinaldi

Affiliations

Lenz Furrer: Institute of Computational Linguistics, University of Zurich
Anna Jancso: Institute of Computational Linguistics, University of Zurich
Nicola Colic: Institute of Computational Linguistics, University of Zurich
Fabio Rinaldi: Institute of Computational Linguistics, University of Zurich

DOI: https://doi.org/10.1186/s13321-018-0326-3
Journal volume & issue: Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. Results We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. Conclusions Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.

Published in Journal of Cheminformatics

ISSN: 1758-2946 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Chemistry
Website: https://jcheminf.biomedcentral.com/

About the journal

Abstract

Keywords