Current Plant Biology (Aug 2014)

Biological process annotation of proteins across the plant kingdom

  • Joachim W. Bargsten,
  • Edouard I. Severing,
  • Jan-Peter Nap,
  • Gabino F. Sanchez-Perez,
  • Aalt D.J. van Dijk

DOI
https://doi.org/10.1016/j.cpb.2014.07.001
Journal volume & issue
Vol. 1, no. C
pp. 73 – 82

Abstract

Read online

Accurate annotation of protein function is key to understanding life at the molecular level, but automated annotation of functions is challenging. We here demonstrate the combination of a method for protein function annotation that uses network information to predict the biological processes a protein is involved in, with a sequence-based prediction method. The combined function prediction is based on co-expression networks and combines the network-based prediction method BMRF with the sequence-based prediction method Argot2. The combination shows significantly improved performance compared to each of the methods separately, as well as compared to Blast2GO. The approach was applied to predict biological processes for the proteomes of rice, barrel clover, poplar, soybean and tomato. The novel function predictions are available at www.ab.wur.nl/bmrf. Analysis of the relationships between sequence similarity and predicted function similarity identifies numerous cases of divergence of biological processes in which proteins are involved, in spite of sequence similarity. This indicates that the integration of network-based and sequence-based function prediction is helpful towards the analysis of evolutionary relationships. Examples of potential divergence are identified for various biological processes, notably for processes related to cell development, regulation, and response to chemical stimulus. Such divergence in biological process annotation for proteins with similar sequences should be taken into account when analyzing plant gene and genome evolution. DATA: All gene functions predictions are available online (http://www.ab.wur.nl/bmrf/). The online resource can be queried for predictions of proteins or for Gene Ontology terms of interest, and the results can be downloaded in bulk. Queries can be based on protein identifiers, biological process Gene Ontology identifiers, or text descriptors of biological processes.

Keywords