PLoS ONE (Jan 2024)

Performance assessment of hybrid machine learning approaches for breast cancer and recurrence prediction.

  • Abhilash Pati,
  • Amrutanshu Panigrahi,
  • Manoranjan Parhi,
  • Jayant Giri,
  • Hong Qin,
  • Saurav Mallik,
  • Sambit Ranjan Pattanayak,
  • Umang Kumar Agrawal

DOI
https://doi.org/10.1371/journal.pone.0304768
Journal volume & issue
Vol. 19, no. 8
p. e0304768

Abstract

Read online

Breast cancer is a major health concern for women everywhere and a major killer of women. Malignant tumors may be distinguished from benign ones, allowing for early diagnosis of this disease. Therefore, doctors need an accurate method of diagnosing tumors as either malignant or benign. Even if therapy begins immediately after diagnosis, some cancer cells may persist in the body, increasing the risk of a recurrence. Metastasis and recurrence are the leading causes of death from breast cancer. Therefore, detecting a return of breast cancer early has become a pressing medical issue. Evaluating and contrasting various Machine Learning (ML) techniques for breast cancer and recurrence prediction is crucial to choosing the best successful method. Inaccurate forecasts are common when using datasets with a large number of attributes. This study addresses the need for effective feature selection and optimization methods by introducing Recursive Feature Elimination (RFE) and Grey Wolf Optimizer (GWO), in response to the limitations observed in existing approaches. In this research, the performance evaluation of methods is enhanced by employing the RFE and GWO, considering the Wisconsin Diagnostic Breast Cancer (WDBC) and Wisconsin Prognostic Breast Cancer (WPBC) datasets taken from the UCI-ML repository. Various preprocessing techniques are applied to raw data, including imputation, scaling, and others. In the second step, relevant feature correlations are used with RFE to narrow down candidate discriminative features. The GWO chooses the best possible combination of attributes for the most accurate result in the next step. We use seven ML classifiers in both datasets to make a binary decision. On the WDBC and WPBC datasets, several experiments have shown accuracies of 98.25% and 93.27%, precisions of 98.13% and 95.56%, sensitivities of 99.06% and 96.63%, specificities of 96.92% and 73.33%, F1-scores of 98.59% and 96.09% and AUCs of 0.982 and 0.936, respectively. The hybrid approach's superior feature selection improved the accuracy of breast cancer performance indicators and recurrence classification.