IMKPse: Identification of Protein Malonylation Sites by the Key Features Into General PseAAC

Wenzheng Bao; Bin Yang; De-Shuang Huang; Dong Wang; Qi Liu; Yue-Hui Chen; Rong Bao

doi:10.1109/ACCESS.2019.2900275

IEEE Access (Jan 2019)

IMKPse: Identification of Protein Malonylation Sites by the Key Features Into General PseAAC

Wenzheng Bao,
Bin Yang,
De-Shuang Huang,
Dong Wang,
Qi Liu,
Yue-Hui Chen,
Rong Bao

Affiliations

Wenzheng Bao: ORCiD; School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China
Bin Yang: School of Information Science and Engineering, Zaozhuang University, Zaozhuang, China
De-Shuang Huang: Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai, China
Dong Wang: School of Information Science, University of Jinan, Jinan, China
Qi Liu: Affiliated Hospital, Xuzhou Medical University, Xuzhou, China
Yue-Hui Chen: School of Information Science, University of Jinan, Jinan, China
Rong Bao: School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, China

DOI: https://doi.org/10.1109/ACCESS.2019.2900275
Journal volume & issue: Vol. 7
pp. 54073 – 54083

Abstract

Read online

Currently, lysine malonylation is treated as one of the most key protein post translational modification in the field of biology and lysine plays a significant role for the regulation of several biological processions. Therefore, accurately identification such modification type will make contributions to understanding their biological processions in this field. The experimental approaches to identify such type of modification sites are time-wasting and laborious in some degree. So, it is necessary and urgent to design and propose computational biology approaches to identify these sites. In this paper, we proposed the IMKPse model that utilized general PseAAC as the classification features and employed flexible neural tree as classification model. In order to deal with the overfitting problem, we utilized the independent datasets of each species. More specifically, such algorithm initially employed amino acid properties from the general PseAAC as the candidate features. With the comparison of candidate features, such a method has the ability to finding out the top five features among them. When evaluated on three data sets in testing set, IMKPse obtained MCC value of 0.9185, 0.9097, and 0.9525 in three species, including E.coli, M.musculus, and H.sapiens, respectively. Meanwhile, IMKPse obtained MCC value of 0.9149, 0.9060, and 0.9467, respectively, in the independent sets. In addition, then, we make some combinations among the top five features. The results demonstrate that the proposed algorithm has superior performances than other approaches. A user-friendly web resource of IMKPSE is available at http://121.250.173.184.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords