A random forest learning assisted “divide and conquer” approach for peptide conformation search

Xin Chen; Bing Yang; Zijing Lin

doi:10.1038/s41598-018-27167-w

Scientific Reports (Jun 2018)

A random forest learning assisted “divide and conquer” approach for peptide conformation search

Xin Chen,
Bing Yang,
Zijing Lin

Affiliations

Xin Chen: Hefei National Laboratory for Physical Sciences at Microscales & CAS Key Laboratory of Strongly-Coupled Quantum Matter Physics, Department of Physics, University of Science and Technology of China
Bing Yang: Hefei National Laboratory for Physical Sciences at Microscales & CAS Key Laboratory of Strongly-Coupled Quantum Matter Physics, Department of Physics, University of Science and Technology of China
Zijing Lin: Hefei National Laboratory for Physical Sciences at Microscales & CAS Key Laboratory of Strongly-Coupled Quantum Matter Physics, Department of Physics, University of Science and Technology of China

DOI: https://doi.org/10.1038/s41598-018-27167-w
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 8

Abstract

Read online

Abstract Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The “divide and conquer” approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the “divide and conquer” approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units (“words”). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units (“grammar”). It is found that amino acid residues may be grouped as equivalent “words”, while the φ-ψ combinations in low-energy peptide conformations follow a distinct “grammar”. The finding of equivalent words empowers the “divide and conquer” method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the “divide and conquer” method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal