Journal of Integrative Bioinformatics (Mar 2009)

Global sequence properties for superfamily prediction: a machine learning approach

  • Dobson Richard Jb.,
  • Munroe Patricia B,
  • Caulfield Mark J,
  • Saqi Mansoor

DOI
https://doi.org/10.2390/biecoll-jib-2009-109
Journal volume & issue
Vol. 6, no. 1
pp. 25 – 49

Abstract

Read online

Functional annotation of a protein sequence in the absence of experimental data or clear similarity to a sequence of known function is difficult. In this study, a simple set of sequence attributes based on physicochemical and predicted structural characteristics were used as input to machine learning methods. In order to improve performance through increasing the data available for training, a technique of sequence enrichment was explored. These methods were used to predict membership to 24 and 49 large and diverse protein superfamiles from the SCOP database.