IEEE Access (Jan 2020)

cACP-2LFS: Classification of Anticancer Peptides Using Sequential Discriminative Model of KSAAP and Two-Level Feature Selection Approach

  • Shahid Akbar,
  • Maqsood Hayat,
  • Muhammad Tahir,
  • Kil To Chong

DOI
https://doi.org/10.1109/ACCESS.2020.3009125
Journal volume & issue
Vol. 8
pp. 131939 – 131948

Abstract

Read online

Cancer is a leading killer disease globally, it occurs when the cellular changes cause the abnormal growth and division of the cells. Conventional treatment such as therapies and wet experimental methods are deemed unsatisfactory and worthless because of its huge cost and laborious nature. However, the recent innovation of anticancer peptides (ACPs) offers an effective way to treat cancer affected cells. Due to the rapid growth of biological sequences, truly identification of ACPs has become a difficult task for scientists. Therefore, measuring the importance of ACPs, an efficient and reliable intelligent model is highly essential to accurately identify its pattern. In this study, three distinct nature encoding schemes are employed to obtain features from peptide sequences. However, K-space amino acid pair (KSAAP) is used to extract highly correlated and effective descriptors. Apart from the sequential features, composite physiochemical properties are applied to gather local structure descriptors. Furthermore, to represent the intrinsic residue information of amino acids, autocovariance is also used. Additionally, a novel two-level feature selection (2LFS) method is utilized to select high discriminative features and to minimize the dimensionality of the proposed descriptors. At last, to examine the performance of the proposed model, several learning hypotheses are investigated to select a superior operational engine. To measure the generalization capability, two diverse benchmark datasets are used. After evaluating the empirical outcomes, KSAAP using 2LFS reported high classification results on both datasets. Whereas, the classification outcomes reveal that our proposed cACP-2LFS achieved ~11% improved performance accuracy than present models in the literature so far. It is expected that our proposed model might be useful in the area of medicine, proteomics, and research academia. The source code and all datasets are publicly available at https://github.com/shahidawkum/cACP-2LFS.

Keywords