IEEE Access (Jan 2025)

Anticancer Peptides Classification Using Long-Short-Term Memory With Novel Feature Representation

  • Nazer Al Tahifah,
  • Muhammad Sohail Ibrahim,
  • Erum Rehman,
  • Naveed Ahmed,
  • Abdul Wahab,
  • Shujaat Khan

DOI
https://doi.org/10.1109/ACCESS.2024.3523068
Journal volume & issue
Vol. 13
pp. 67 – 79

Abstract

Read online

Cancer treatment is a challenging endeavor because of the intricacy, heterogeneity, and diversity of cancer causes. Comprehensive therapeutic approaches are crucial for cancer treatment. Anticancer peptides (ACPs) present a potentially effective therapeutic option. However, the extensive identification and synthesis of these peptides present a persistent difficulty that calls for the creation of effective prediction techniques. Existing techniques either suffer from low accuracy or employ high-dimensional feature sets, frequently producing sparse features and leading to ineffective model designs. This work presents a novel set of features and a long-short-term-memory (LSTM)-based classification strategy to create an efficient model. The suggested feature set includes three new and two modern feature extraction methods. The binary profile feature and k-mer sparse matrix of the reduced amino acid alphabet are part of the modern feature set. The combination of the composition of the K-spaced side chain pairs (CKSSCP), the composition of the K-spaced electrically charged side chain pairs (CKSECSCP), and the combination of [pk(CO2H)] + [pk(NH2)] + [pk(R)] + [isoelectric point] is used to derive the novel features. The suggested LSTM model is trained using the combined feature set. The trials are carried out with a k-fold cross-validation method on benchmark datasets. The results indicate that the proposed model outperforms alternative ACP classification techniques in terms of Mathew’s correlation coefficient (MCC) and accuracy. The ACP740 dataset with 5-folds yields an MCC score of 75%, which is 12%, 11%, 3%, and 8% greater than those of the ACP-DL, ACP-DA, ACP-MHCNN, and ACP-KSRC approaches, respectively. For the ACP344 dataset with 10-folds, the proposed method achieves an MCC score of 85.14%, which is 23% and 2% higher than the MCC scores of ACP-DL and SAP methods, respectively. Better classification performance offered by the proposed approach could help identify new ACPs and better understand their structural and chemical characteristics. The source code and the datasets are available on the author’s GitHub page (https://github.com/Shujaat123/ACP-LSTM-NFR).

Keywords