PLoS ONE (Jan 2024)

Epidemiological breast cancer prediction by country: A novel machine learning approach.

  • Hasna El Haji,
  • Nada Sbihi,
  • Bassma Guermah,
  • Amine Souadka,
  • Mounir Ghogho

DOI
https://doi.org/10.1371/journal.pone.0308905
Journal volume & issue
Vol. 19, no. 8
p. e0308905

Abstract

Read online

Breast cancer remains a significant contributor to cancer-related deaths among women globally. We seek for this study to examine the correlation between the incidence rates of breast cancer and newly identified risk factors. Additionally, we aim to utilize machine learning models to predict breast cancer incidence at a country level. Following an extensive review of the available literature, we have identified a range of recently studied risk factors associated with breast cancer. Subsequently, we gathered data on these factors and breast cancer incidence rates from numerous online sources encompassing 151 countries. To evaluate the relationship between these factors and breast cancer incidence, we assessed the normality of the data and conducted Spearman's correlation test. Furthermore, we refined six regression models to forecast future breast cancer incidence rates. Our findings indicate that the incidence of breast cancer is most positively correlated with the average age of women in a country, as well as factors such as meat consumption, CO2 emissions, depression, sugar consumption, tobacco use, milk intake, mobile cells, alcohol consumption, pesticides, and oral contraceptive use. As for prediction, the CatBoost Regressor successfully predicted future breast cancer incidence with an R squared value of 0.84 ± 0.03. An increased incidence of breast cancer is mainly associated with dietary habits and lifestyle. Our findings and recommendations can serve as a baseline for developing educational programs intended to heighten awareness amongst women in countries with heightened risk.