Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

Spackman K; Dubay C; Hersh WR; Cohen AM

doi:10.1186/1471-2105-6-103

BMC Bioinformatics (Apr 2005)

Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts

Spackman K,
Dubay C,
Hersh WR,
Cohen AM

Affiliations

Spackman K
Dubay C
Hersh WR
Cohen AM

DOI: https://doi.org/10.1186/1471-2105-6-103
Journal volume & issue: Vol. 6, no. 1
p. 103

Abstract

Read online

Abstract Background Text-mining can assist biomedical researchers in reducing information overload by extracting useful knowledge from large collections of text. We developed a novel text-mining method based on analyzing the network structure created by symbol co-occurrences as a way to extend the capabilities of knowledge extraction. The method was applied to the task of automatic gene and protein name synonym extraction. Results Performance was measured on a test set consisting of about 50,000 abstracts from one year of MEDLINE. Synonyms retrieved from curated genomics databases were used as a gold standard. The system obtained a maximum F-score of 22.21% (23.18% precision and 21.36% recall), with high efficiency in the use of seed pairs. Conclusion The method performs comparably with other studied methods, does not rely on sophisticated named-entity recognition, and requires little initial seed knowledge.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal