BMC Bioinformatics (Dec 2018)
Computational prediction of plasma protein binding of cyclic peptides from small molecule experimental data using sparse modeling techniques
Abstract
Abstract Background Cyclic peptide-based drug discovery is attracting increasing interest owing to its potential to avoid target protein depletion. In drug discovery, it is important to maintain the biostability of a drug within the proper range. Plasma protein binding (PPB) is the most important index of biostability, and developing a computational method to predict PPB of drug candidate compounds contributes to the acceleration of drug discovery research. PPB prediction of small molecule drug compounds using machine learning has been conducted thus far; however, no study has investigated cyclic peptides because experimental information of cyclic peptides is scarce. Results First, we adopted sparse modeling and small molecule information to construct a PPB prediction model for cyclic peptides. As cyclic peptide data are limited, applying multidimensional nonlinear models involves concerns regarding overfitting. However, models constructed by sparse modeling can avoid overfitting, offering high generalization performance and interpretability. More than 1000 PPB data of small molecules are available, and we used them to construct a prediction models with two enumeration methods: enumerating lasso solutions (ELS) and forward beam search (FBS). The accuracies of the prediction models constructed by ELS and FBS were equal to or better than those of conventional non-linear models (MAE = 0.167–0.174) on cross-validation of a small molecule compound dataset. Moreover, we showed that the prediction accuracies for cyclic peptides were close to those for small molecule compounds (MAE = 0.194–0.288). Such high accuracy could not be obtained by a simple method of learning from cyclic peptide data directly by lasso regression (MAE = 0.286–0.671) or ridge regression (MAE = 0.244–0.354). Conclusion In this study, we proposed a machine learning techniques that uses low-dimensional sparse modeling to predict the PPB value of cyclic peptides computationally. The low-dimensional sparse model not only exhibits excellent generalization performance but also improves interpretation of the prediction model. This can provide common an noteworthy knowledge for future cyclic peptide drug discovery studies.
Keywords