Applied Sciences (Sep 2024)

Enhancing Breast Cancer Risk Prediction with Machine Learning: Integrating BMI, Smoking Habits, Hormonal Dynamics, and BRCA Gene Mutations—A Game-Changer Compared to Traditional Statistical Models?

  • Luana Conte,
  • Emanuele Rizzo,
  • Emanuela Civino,
  • Paolo Tarantino,
  • Giorgio De Nunzio,
  • Elisabetta De Matteis

DOI
https://doi.org/10.3390/app14188474
Journal volume & issue
Vol. 14, no. 18
p. 8474

Abstract

Read online

The association between genetics and lifestyle factors is crucial when determining breast cancer susceptibility, a leading cause of deaths globally. This research aimed to compare the body mass index, smoking behavior, hormonal influences, and BRCA gene mutations between affected patients and healthy individuals, all with a family history of cancer. All these factors were then utilized as features to train a machine learning (ML) model to predict the risk of breast cancer development. Between 2020 and 2023, a total of 1389 women provided detailed lifestyle and risk factor data during visits to a familial cancer center in Italy. Descriptive and inferential statistics were assessed to explore the differences between the groups. Among the various classifiers used, the ensemble of decision trees was the best performer, with a 10-fold cross-validation scheme for training after normalizing the features. The performance of the model was evaluated using the receiver operating characteristic (ROC) curve and its area under the curve (AUC), alongside the accuracy, sensitivity, specificity, precision, and F1 score. Analysis revealed that individuals in the tumor group exhibited a higher risk profile when compared to their healthy counterparts, particularly in terms of the lifestyle and genetic markers. The ML model demonstrated predictive power, with an AUC of 81%, 88% sensitivity, 57% specificity, 78% accuracy, 80% precision, and an F1 score of 0.84. These metrics significantly outperformed traditional statistical prediction models, including the BOADICEA and BCRAT, which showed an AUC below 0.65. This study demonstrated the efficacy of an ML approach in identifying women at higher risk of breast cancer, leveraging lifestyle and genetic factors, with an improved predictive performance over traditional methods.

Keywords