Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.

Hyun Joo; Archana G Chavan; Ryan Day; Kristin P Lennox; Paul Sukhanov; David B Dahl; Marina Vannucci; Jerry Tsai

doi:10.1371/journal.pcbi.1002234

PLoS Computational Biology (Oct 2011)

Near-native protein loop sampling using nonparametric density estimation accommodating sparcity.

Hyun Joo,
Archana G Chavan,
Ryan Day,
Kristin P Lennox,
Paul Sukhanov,
David B Dahl,
Marina Vannucci,
Jerry Tsai

Affiliations

Hyun Joo
Archana G Chavan
Ryan Day
Kristin P Lennox
Paul Sukhanov
David B Dahl
Marina Vannucci
Jerry Tsai

DOI: https://doi.org/10.1371/journal.pcbi.1002234
Journal volume & issue: Vol. 7, no. 10
p. e1002234

Abstract

Read online

Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD 7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/.

Published in PLoS Computational Biology

ISSN: 1553-734X (Print); 1553-7358 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Science: Biology (General)
Website: https://journals.plos.org/ploscompbiol/

About the journal