IEEE Access (Jan 2024)

Breast Carcinoma Prediction Through Integration of Machine Learning Models

  • Rosmeri Martinez-Licort,
  • Carlos de la Cruz Leon,
  • Deevyankar Agarwal,
  • Benjamin Sahelices,
  • Isabel de la Torre,
  • Jose Pablo Miramontes-Gonzalez,
  • Mohammed Amoon

DOI
https://doi.org/10.1109/ACCESS.2024.3431998
Journal volume & issue
Vol. 12
pp. 134635 – 134650

Abstract

Read online

Breast cancer poses a global health challenge, with high incidence and mortality rates. Early detection and precise diagnosis are crucial for patient prognosis. Machine learning (ML) models applied to mammary biopsy image data hold promise for achieving an efficient and accurate breast cancer diagnosis. In this study, we evaluated the performance of several ML algorithms, including Logistic Regression (LR), Random Forest (RF), Naive Bayes (NB) and Support Vector Machine (SVM). We establish evaluation contexts by implementing data standardization and reducing the correlation between variables. Firstly, we select the best-performing parameters for each algorithm by building and evaluating the individual models. Then, we implement a combined model using weighted voting, where the weights of each model are determined based on its performance on the test dataset. The final model is constructed by combining the LR, RF and SVM models. We find that SVM is the best-performance individual model, so it has the highest weight in the final model. The final integrated model achieves an accuracy of 98%, a precision of 97%, a recall of 99%, an F1-score of 98% and an AUC of 0.98. Our weighted voting model compares favourably with the other models analysed. This approach demonstrates its efficiency and transparency in handling structured medical data. It is a prototype that will be refined and expanded to encompass larger real-world datasets.

Keywords