Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

Seyed Morteza Najibi; Mehdi Maadooliat; Lan Zhou; Jianhua Z. Huang; Xin Gao

Computational and Structural Biotechnology Journal (Jan 2017)

Protein Structure Classification and Loop Modeling Using Multiple Ramachandran Distributions

Seyed Morteza Najibi,
Mehdi Maadooliat,
Lan Zhou,
Jianhua Z. Huang,
Xin Gao

Affiliations

Seyed Morteza Najibi: Department of Statistics, College of Sciences, Shiraz University, Shiraz, Iran
Mehdi Maadooliat: Department of Mathematics, Statistics and Computer Science, Marquette University, WI 53201-1881, USA; Center for Human Genetics, Marshfield Clinic Research Institute, Marshfield, WI 54449, USA
Lan Zhou: Department of Statistics, Texas A&M University, TX 77843-3143, USA
Jianhua Z. Huang: Department of Statistics, Texas A&M University, TX 77843-3143, USA
Xin Gao: Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia; Corresponding author.

Journal volume & issue: Vol. 15
pp. 243 – 254

Abstract

Read online

Recently, the study of protein structures using angular representations has attracted much attention among structural biologists. The main challenge is how to efficiently model the continuous conformational space of the protein structures based on the differences and similarities between different Ramachandran plots. Despite the presence of statistical methods for modeling angular data of proteins, there is still a substantial need for more sophisticated and faster statistical tools to model the large-scale circular datasets. To address this need, we have developed a nonparametric method for collective estimation of multiple bivariate density functions for a collection of populations of protein backbone angles. The proposed method takes into account the circular nature of the angular data using trigonometric spline which is more efficient compared to existing methods. This collective density estimation approach is widely applicable when there is a need to estimate multiple density functions from different populations with common features. Moreover, the coefficients of adaptive basis expansion for the fitted densities provide a low-dimensional representation that is useful for visualization, clustering, and classification of the densities. The proposed method provides a novel and unique perspective to two important and challenging problems in protein structure research: structure-based protein classification and angular-sampling-based protein loop structure prediction. Keywords: Bivariate splines, Log-spline density estimation, Protein structure, Ramachandran distribution, Roughness penalty, Trigonometric B-spline, Protein classification, SCOP

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal