Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

Mottaz Anaïs; Ehrler Frédéric; Tbahriti Imad; Gobeill Julien; Veuthey Anne-Lise; Ruch Patrick

doi:10.1186/1471-2105-9-S3-S9

BMC Bioinformatics (Apr 2008)

Gene Ontology density estimation and discourse analysis for automatic GeneRiF extraction

Mottaz Anaïs,
Ehrler Frédéric,
Tbahriti Imad,
Gobeill Julien,
Veuthey Anne-Lise,
Ruch Patrick

Affiliations

Mottaz Anaïs
Ehrler Frédéric
Tbahriti Imad
Gobeill Julien
Veuthey Anne-Lise
Ruch Patrick

DOI: https://doi.org/10.1186/1471-2105-9-S3-S9
Journal volume & issue: Vol. 9, no. Suppl 3
p. S9

Abstract

Read online

Abstract Background This paper describes and evaluates a sentence selection engine that extracts a GeneRiF (Gene Reference into Functions) as defined in ENTREZ-Gene based on a MEDLINE record. Inputs for this task include both a gene and a pointer to a MEDLINE reference. In the suggested approach we merge two independent sentence extraction strategies. The first proposed strategy (LASt) uses argumentative features, inspired by discourse-analysis models. The second extraction scheme (GOEx) uses an automatic text categorizer to estimate the density of Gene Ontology categories in every sentence; thus providing a full ranking of all possible candidate GeneRiFs. A combination of the two approaches is proposed, which also aims at reducing the size of the selected segment by filtering out non-content bearing rhetorical phrases. Results Based on the TREC-2003 Genomics collection for GeneRiF identification, the LASt extraction strategy is already competitive (52.78%). When used in a combined approach, the extraction task clearly shows improvement, achieving a Dice score of over 57% (+10%). Conclusions Argumentative representation levels and conceptual density estimation using Gene Ontology contents appear complementary for functional annotation in proteomics.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal