Scientific Reports (Dec 2024)
Application of machine learning in breast cancer survival prediction using a multimethod approach
Abstract
Abstract Breast cancer is one of the most prevalent cancers with an increasing trend in both incidence and mortality rates in Iran. Survival analysis is a pivotal measure in setting appropriate care plans. To the best of our knowledge, this study is pioneering in Iran, introducing a multi-method approach using a Deep Neural Network (DNN) and 11 conventional machine learning (ML) methods to predict the 5 year survival of women with breast cancer. Supplying data from two centers comprising a total of 2644 records and incorporating external validation further distinguishes the study. Thirty-four features were selected based on a literature review and common variables in both datasets. Feature selection was also performed using a p value criterion (< 0.05) and a survey involving oncologists. A total of 108 models were trained. According to external validation, the DNN model trained with the Shiraz dataset, considering all features, exhibited the highest accuracy (85.56%). While the DNN model showed superior accuracy in external validation, it did not consistently achieve the highest performance across all evaluation metrics. Notably, models trained with the Shiraz dataset outperformed those trained with the Tehran dataset, possibly due to the lower number of missing values in the Shiraz dataset.
Keywords