BMC Bioinformatics (Dec 2023)

A novel and innovative cancer classification framework through a consecutive utilization of hybrid feature selection

  • Rajul Mahto,
  • Saboor Uddin Ahmed,
  • Rizwan ur Rahman,
  • Rabia Musheer Aziz,
  • Priyanka Roy,
  • Saurav Mallik,
  • Aimin Li,
  • Mohd Asif Shah

DOI
https://doi.org/10.1186/s12859-023-05605-5
Journal volume & issue
Vol. 24, no. 1
pp. 1 – 26

Abstract

Read online

Abstract Cancer prediction in the early stage is a topic of major interest in medicine since it allows accurate and efficient actions for successful medical treatments of cancer. Mostly cancer datasets contain various gene expression levels as features with less samples, so firstly there is a need to eliminate similar features to permit faster convergence rate of classification algorithms. These features (genes) enable us to identify cancer disease, choose the best prescription to prevent cancer and discover deviations amid different techniques. To resolve this problem, we proposed a hybrid novel technique CSSMO-based gene selection for cancer classification. First, we made alteration of the fitness of spider monkey optimization (SMO) with cuckoo search algorithm (CSA) algorithm viz., CSSMO for feature selection, which helps to combine the benefit of both metaheuristic algorithms to discover a subset of genes which helps to predict a cancer disease in early stage. Further, to enhance the accuracy of the CSSMO algorithm, we choose a cleaning process, minimum redundancy maximum relevance (mRMR) to lessen the gene expression of cancer datasets. Next, these subsets of genes are classified using deep learning (DL) to identify different groups or classes related to a particular cancer disease. Eight different benchmark microarray gene expression datasets of cancer have been utilized to analyze the performance of the proposed approach with different evaluation matrix such as recall, precision, F1-score, and confusion matrix. The proposed gene selection method with DL achieves much better classification accuracy than other existing DL and machine learning classification models with all large gene expression dataset of cancer.

Keywords