Mathematical Biosciences and Engineering (Jul 2021)

TYLER, a fast method that accurately predicts cyclin-dependent proteins by using computation-based motifs and sequence-derived features

  • Jian Zhang,
  • Xingchen Liang,
  • Feng Zhou ,
  • Bo Li ,
  • Yanling Li

DOI
https://doi.org/10.3934/mbe.2021318
Journal volume & issue
Vol. 18, no. 5
pp. 6410 – 6429

Abstract

Read online

Cyclins and related cyclin-dependent kinases play vital roles in regulating the progression in the cell cycle. Understanding the intrinsic mechanisms of cyclins promises knowledge about cell uncontrolled proliferation and prevention of cancer cells. Therefore, accurate recognition of cyclins is important for the investigation of tumor cells and biomedical engineering. This study proposes a novel sequence-based predictor named TYLER (predicT cYcLin-dEpendent pRoteins) for addressing the long challenge problem of predicting cyclin-dependent proteins (CDPs). We use information theory to compute selectively enriched CDP-related motifs and build the motif-based model. For those proteins without sharing enriched motifs, we compute sequence-derived features and construct machine learning-based models. We optimize the weights of two different models to build a more accurate predictor. We estimate these two types of models by using 5-fold cross-validations on the TRAINING dataset. We prove that the combination of two models and optimization of the corresponding weights promises decent and robust results on both TRAINING and independent TEST dataset. The empirical test demonstrates that TYLER is robust predictor and statistically significantly better than current methods. The runtime assessment reveals TYLER is a high-throughput effective method. We use TYLER to make predictions on the human proteome, and use the results to hypothesize CDPs. The latest experimental verified CDPs and GO analysis proves that some of our novel predictions shall be potential CDPs. TYLER is implemented as a public user-friendly web server at http://www.inforstation.com/webservers/TYLER/. We share all data and source code that used in this research at https://github.com/biocomputinglab/TYLER.git.

Keywords