International Journal of Telemedicine and Applications (Jan 2024)
Ensemble Classification Model With CFS-IGWO–Based Feature Selection for Cancer Detection Using Microarray Data
Abstract
Cancer is the top cause of death worldwide, and machine learning (ML) has made an indelible mark on the field of early cancer detection, thereby lowering the death toll. ML-based model for cancer diagnosis is done using two forms of data: gene expression data and microarray data. The data on gene expression levels includes many dimensions. When dealing with data with a high dimension, the efficiency of an ML-based model is decreased. Microarray data is distinguished by its high dimensionality with a greater number of features and a smaller sample size. In this work, two ensemble techniques are proposed using majority voting technique and weighted average technique. Correlation feature selection (CFS) is used for feature selection, and improved grey wolf optimizer (IGWO) is used for feature optimization. Support vector machines (SVMs), multilayer perceptron (MLP) classification, logistic regression (LR), decision tree (DT), adaptive boosting (AdaBoost) classifier, extreme learning machines (ELMs), and K-nearest neighbor (KNN) are used as classifiers. The results of each distinct base learner were then combined using weighted average and majority voting ensemble methods. Accuracy (ACC), specificity (SPE), sensitivity (SEN), precision (PRE), Matthews correlation coefficient (MCC), and F1-score (F1-S) are used to assess the performance. Our result shows that majority voting achieves better performance than the weighted average ensemble technique.