IEEE Access (Jan 2024)

Gene Selection Based Cancer Classification With Adaptive Optimization Using Deep Learning Architecture

  • Anju Das,
  • N. Neelima,
  • K. Deepa,
  • Tolga Ozer

DOI
https://doi.org/10.1109/ACCESS.2024.3392633
Journal volume & issue
Vol. 12
pp. 62234 – 62255

Abstract

Read online

Early cancer identification using gene expression data is critical for providing successful patient care. Accurate data recognition is essential to prevent improper detection because it may result in higher complexities and increased mortality rates. Gene expression data typically include numerous features, each representing distinct genes. The abundance of features introduces high dimensionality, contributing to high computational complexity and resource demands. Furthermore, the presence of redundant or highly correlated selected features may lead to multicollinearity issues. In the existing works, certain limitations, such as reduced performance due to degraded data quality, high storage space requirements, overfitting issues, and lack of robustness, can compromise overall classification accuracy. To address these challenges and enhance classification outcomes, this research employs an efficient framework based on a deep learning (DL) approach. Initially, the data is collected from five gene cancer datasets, which are then augmented to maximize the data size. Min-Max Normalization is used for data pre-processing. The Enhanced Chimp Optimization (ECO) algorithm is applied to select the most significant genes while eliminating redundant or unwanted ones. Based on the selected gene set, the Depth-wise Separable Convolutional Neural Network (DSCNN) is employed to categorize diverse cancerous and non-cancerous classes. The performance of the proposed model is greatly improved by resolving the dimensionality and overfitting problems. The implementation is carried out using PYTHON, and the overall accuracy of the DSCNN model is determined to be 99.18% across all datasets. Additionally, metrics such as precision, recall, and F1 score are evaluated to analyze the model’s overall performance. The proposed model demonstrates a significantly higher effective performance compared to existing approaches.

Keywords