Graph-based sequence annotation using a data integration approach

Pesch Robert; Lysenko Artem; Hindle Matthew; Hassani-Pak Keywan; Thiele Ralf; Rawlings Christopher; Köhler Jacob; Taubert Jan

doi:10.1515/jib-2008-94

Journal of Integrative Bioinformatics (Jun 2008)

Graph-based sequence annotation using a data integration approach

Pesch Robert,
Lysenko Artem,
Hindle Matthew,
Hassani-Pak Keywan,
Thiele Ralf,
Rawlings Christopher,
Köhler Jacob,
Taubert Jan

Affiliations

Pesch Robert: Department of Computer Science, Bonn-Rhein-Sieg University of Applied Sciences, Germany
Lysenko Artem: Department of Biomathematics and Bioinformatics, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Hindle Matthew: Department of Biomathematics and Bioinformatics, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Hassani-Pak Keywan: Department of Biomathematics and Bioinformatics, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Thiele Ralf: Department of Computer Science, Bonn-Rhein-Sieg University of Applied Sciences, Germany
Rawlings Christopher: Department of Biomathematics and Bioinformatics, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland
Köhler Jacob: Protein Research Group, University of Tromsø, Norway
Taubert Jan: Department of Biomathematics and Bioinformatics, Rothamsted Research, United Kingdom of Great Britain and Northern Ireland

DOI: https://doi.org/10.1515/jib-2008-94
Journal volume & issue: Vol. 5, no. 2
pp. 58 – 72

Abstract

Read online

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara- Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal