A Hierarchical Approach to Protein Fold Prediction

Mohammad Tabrez Anwar Shamim; Nagarajaram Hampapathalu Adimurthy

doi:10.2390/biecoll-jib-2011-185

Journal of Integrative Bioinformatics (Mar 2011)

A Hierarchical Approach to Protein Fold Prediction

Mohammad Tabrez Anwar Shamim,
Nagarajaram Hampapathalu Adimurthy

Affiliations

Mohammad Tabrez Anwar Shamim: Laboratory of Computational Biology, CDFD, Bldg.7, Gruhakalpa, Nampally, Hyderabad 500 001, http://www.cdfd.org.in, United States of America
Nagarajaram Hampapathalu Adimurthy: Laboratory of Computational Biology, CDFD, Bldg.7, Gruhakalpa, Nampally, Hyderabad 500 001, http://www.cdfd.org.in, India

DOI: https://doi.org/10.2390/biecoll-jib-2011-185
Journal volume & issue: Vol. 8, no. 1
pp. 66 – 77

Abstract

Read online

Fold recognition, assigning novel proteins to known structures, forms an important component of the overall protein structure discovery process. The available methods for protein fold recognition are limited by the low fold-coverage and/or low prediction accuracies. We describe here a new Support Vector Machine (SVM)-based method for protein fold prediction with high prediction accuracy and high fold-coverage. The new method of fold prediction with high fold-coverage was developed by training and testing on a large number of folds in order to make the method suitable for large scale fold predictions. However, presence of large number of folds in the training set made the classification task difficult as a consequence of increased complexity involved in binary classifications of SVMs. In order to overcome this complexity we adopted a hierarchical approach where fold-prediction is made in two steps. At the first step structural class of the query is predicted and at the second step fold is predicted within the predicted structural class. This decreased the complexity of the classification problem and also improved the overall fold prediction accuracy. To the best of our knowledge this is the first taxonomic fold recognition method to cover over 700 protein-folds and gives prediction accuracy of around 70% on a benchmark dataset. Since the new method gives rise to state of the art prediction performance and hence can be very useful for structural characterization of proteins discovered in various genomes.

Published in Journal of Integrative Bioinformatics

ISSN: 1613-4516 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.degruyter.com/view/j/jib

About the journal