Ranking near-native candidate protein structures via random forest classification

Hongjie Wu; Hongmei Huang; Weizhong Lu; Qiming Fu; Yijie Ding; Jing Qiu; Haiou Li

doi:10.1186/s12859-019-3257-8

BMC Bioinformatics (Dec 2019)

Ranking near-native candidate protein structures via random forest classification

Hongjie Wu,
Hongmei Huang,
Weizhong Lu,
Qiming Fu,
Yijie Ding,
Jing Qiu,
Haiou Li

Affiliations

Hongjie Wu: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Hongmei Huang: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Weizhong Lu: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Qiming Fu: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Yijie Ding: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Jing Qiu: School of Electronic and Information Engineering, Suzhou University of Science and Technology
Haiou Li: School of Electronic and Information Engineering, Suzhou University of Science and Technology

DOI: https://doi.org/10.1186/s12859-019-3257-8
Journal volume & issue: Vol. 20, no. S25
pp. 1 – 13

Abstract

Read online

Abstract Background In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. Results To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. Conclusions In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords