IEEE Access (Jan 2020)
Prediction of Cyclin Protein Using Two-Step Feature Selection Technique
Abstract
Cyclins are a family of proteins that regulate the cell cycle by activating cyclin-dependent kinases or a group of enzymes required in the cell cycle. Constructing a model to classify Cyclins is of importance to understand their function. It is urgent to construct a machine learning based model to identify Cyclins because of low similarity between the sequence of Cyclins. In this study, a method based on support vector machine (SVM) is developed to recognize Cyclins only using amino acid sequence information. 18 feature descriptors with a total of 13151-dimension features were extracted, and the feature dimension were reduced to 8 through feature selection technique. The reserved features show some of feature descriptors such as Autocorrelation, AAC and CTDC are important in the identification of Cyclins. Jackknife cross-validation results indicate our model would classify Cyclins with an accuracy of 91.9%, which is superior to a recent study using the same data set. Our work provides an important tool for discriminating Cyclins.
Keywords