IEEE Access (Jan 2023)

Auxiliary Diagnosis of Breast Cancer Based on Machine Learning and Hybrid Strategy

  • Hua Chen,
  • Kehui Mei,
  • Yuan Zhou,
  • Nan Wang,
  • Guangxing Cai

DOI
https://doi.org/10.1109/ACCESS.2023.3312305
Journal volume & issue
Vol. 11
pp. 96374 – 96386

Abstract

Read online

Breast cancer has replaced lung cancer as the number one cancer among women worldwide. In this paper, we take breast cancer as the research object, and pioneer a hybrid strategy to process the data, and combine the machine learning method to build a more accurate and efficient breast cancer auxiliary diagnosis model. First, the combined sampling method SMOTE-ENN is used to solve the problem of sample imbalance, and the data are standardized to make the data have better separability. Then, the features of the dataset are initially screened using the mutual information method, and further secondary feature selection is performed using the recursive feature elimination method based on the XGBoost algorithm. Thus, the feature dimensionality of the dataset is reduced and the generalization ability of the model is improved. Finally, five different machine learning models are used for classification prediction, the best combination of parameters for each model is found using a grid search method, and the final results of each model are derived using a 10-fold cross-validation method. The experiments are conducted using the Wisconsin Diagnostic Breast Cancer dataset (WDBC), and the results of the study find that after the data are processed by the hybrid strategy, the best prediction results are obtained using the RF model with 99.52% accuracy, which is better than the previous research methods.

Keywords