Discover Artificial Intelligence (Jan 2025)

An improved soft voting-based machine learning technique to detect breast cancer utilizing effective feature selection and SMOTE-ENN class balancing

  • Indu Chhillar,
  • Ajmer Singh

DOI
https://doi.org/10.1007/s44163-025-00224-w
Journal volume & issue
Vol. 5, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Breast cancer is the primary cause of death among women globally, and it is becoming more prevalent. Early detection and precise diagnosis of breast cancer can reduce the disease’s mortality rate. Recent advances in machine learning have benefited in this regard. However, if the dataset contains duplicate or irrelevant features, machine learning-based algorithms are unable to give the intended results. To address this issue, a series of effective strategies such as the Robust Scaler is used for data scaling, Synthetic Minority Over-sampling Technique-Edited Nearest Neighbor (SMOTE-ENN) is utilized for class balancing, and Boruta and Coefficient-Based Feature Selection (CBFS) methods are employed for feature selection. For more accurate and reliable breast cancer classification, this study proposes a soft voting-based ensemble model that harnesses the capabilities of three diverse classifiers: Multilayer Perceptron (MLP), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost). To show the efficiency of the proposed ensemble model, it is compared with its base classifiers using the Wisconsin Diagnosis Breast Cancer Dataset (WDBC). The results of the experiment revealed that the soft voting classifier achieved high scores with an accuracy of 99.42%, precision of 100.0%, recall of 98.41%, F1 score of 99.20%, and AUC of 1.0 when it is trained on optimal features obtained from the CBFS method. However, with tenfold cross-validation (10-FCV), it shows a mean accuracy score of 99.34%. A comprehensive analysis of the results revealed that the suggested technique outperformed the existing state-of-the-art methods due to the efficient data preprocessing, feature selection, and ensemble methods.

Keywords