Scientific Reports (Sep 2023)

Inferring linear-B cell epitopes using 2-step metaheuristic variant-feature selection using genetic algorithm

  • Pratik Angaitkar,
  • Turki Aljrees,
  • Saroj Kumar Pandey,
  • Ankit Kumar,
  • Rekh Ram Janghel,
  • Tirath Prasad Sahu,
  • Kamred Udham Singh,
  • Teekam Singh

DOI
https://doi.org/10.1038/s41598-023-41179-1
Journal volume & issue
Vol. 13, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Linear-B cell epitopes (LBCE) play a vital role in vaccine design; thus, efficiently detecting them from protein sequences is of primary importance. These epitopes consist of amino acids arranged in continuous or discontinuous patterns. Vaccines employ attenuated viruses and purified antigens. LBCE stimulate humoral immunity in the body, where B and T cells target circulating infections. To predict LBCE, the underlying protein sequences undergo a process of feature extraction, feature selection, and classification. Various system models have been proposed for this purpose, but their classification accuracy is only moderate. In order to enhance the accuracy of LBCE classification, this paper presents a novel 2-step metaheuristic variant-feature selection method that combines a linear support vector classifier (LSVC) with a Modified Genetic Algorithm (MGA). The feature selection model employs mono-peptide, dipeptide, and tripeptide features, focusing on the most diverse ones. These selected features are fed into a machine learning (ML)-based parallel ensemble classifier. The ensemble classifier combines correctly classified instances from various classifiers, including k-Nearest Neighbor (kNN), random forest (RF), logistic regression (LR), and support vector machine (SVM). The ensemble classifier came up with an impressively high accuracy of 99.3% as a result of its work. This accuracy is superior to the most recent models that are considered to be state-of-the-art for linear B-cell classification. As a direct consequence of this, the entire system model can now be utilised effectively in real-time clinical settings.