IEEE Access (Jan 2019)
IMKPse: Identification of Protein Malonylation Sites by the Key Features Into General PseAAC
Abstract
Currently, lysine malonylation is treated as one of the most key protein post translational modification in the field of biology and lysine plays a significant role for the regulation of several biological processions. Therefore, accurately identification such modification type will make contributions to understanding their biological processions in this field. The experimental approaches to identify such type of modification sites are time-wasting and laborious in some degree. So, it is necessary and urgent to design and propose computational biology approaches to identify these sites. In this paper, we proposed the IMKPse model that utilized general PseAAC as the classification features and employed flexible neural tree as classification model. In order to deal with the overfitting problem, we utilized the independent datasets of each species. More specifically, such algorithm initially employed amino acid properties from the general PseAAC as the candidate features. With the comparison of candidate features, such a method has the ability to finding out the top five features among them. When evaluated on three data sets in testing set, IMKPse obtained MCC value of 0.9185, 0.9097, and 0.9525 in three species, including E.coli, M.musculus, and H.sapiens, respectively. Meanwhile, IMKPse obtained MCC value of 0.9149, 0.9060, and 0.9467, respectively, in the independent sets. In addition, then, we make some combinations among the top five features. The results demonstrate that the proposed algorithm has superior performances than other approaches. A user-friendly web resource of IMKPSE is available at http://121.250.173.184.
Keywords