Enhancing Data Integration with Text Analysis to Find Proteins Implicated in Plant Stress Response

Hassani-Pak Keywan; Legaie Roxane; Canevet Catherine; Berg Hugo A. van den; Moore Jonathan D.; Rawlings Christopher J.

doi:10.1515/jib-2010-121

Journal of Integrative Bioinformatics (Dec 2010)

Enhancing Data Integration with Text Analysis to Find Proteins Implicated in Plant Stress Response

Hassani-Pak Keywan,
Legaie Roxane,
Canevet Catherine,
Berg Hugo A. van den,
Moore Jonathan D.,
Rawlings Christopher J.

Affiliations

Hassani-Pak Keywan: Centre for Mathematical and Computational Biology, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Legaie Roxane: Warwick Systems Biology Centre, University of Warwick, United Kingdom of Great Britain and Northern Ireland
Canevet Catherine: Centre for Mathematical and Computational Biology, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Berg Hugo A. van den: Warwick Systems Biology Centre, University of Warwick, United Kingdom of Great Britain and Northern Ireland
Moore Jonathan D.: Warwick Systems Biology Centre, University of Warwick, United Kingdom of Great Britain and Northern Ireland
Rawlings Christopher J.: Centre for Mathematical and Computational Biology, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland

DOI: https://doi.org/10.1515/jib-2010-121
Journal volume & issue: Vol. 7, no. 3
pp. 87 – 99

Abstract

Read online

High throughput genomic studies can identify large numbers of potential candidate genes, which must be interpreted and filtered by investigators to select the best ones for further analysis. Prioritization is generally based on evidence that supports the role of a gene product in the biological process being investigated. The two most important bodies of information providing such evidence are bioinformatics databases and the scientific literature. In this paper we present an extension to the Ondex data integration framework that uses text mining techniques over Medline abstracts as a method for accessing both these bodies of evidence in a consistent way. In an example use case, we apply our method to create a knowledge base of Arabidopsis proteins implicated in plant stress response and use various scoring metrics to identify key protein-stress associations. In conclusion, we show that the additional text mining features are able to highlight proteins using the scientific literature that would not have been seen using data integration alone. Ondex is an open-source software project and can be downloaded, together with the text mining features described here, from www.ondex.org.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal