Tehran University Medical Journal (May 2021)

Using data mining techniques for predicting the survival rate of breast cancer patients: a review article

  • Hossein Bagherian,
  • Shaghayegh Haghjooy Javanmard,
  • Mehran Sharifi,
  • Mohammad Sattari

Journal volume & issue
Vol. 79, no. 3
pp. 176 – 186

Abstract

Read online

This review was conducted between December 2018 and March 2019 at Isfahan University of Medical Sciences. A review of various studies revealed what data mining techniques to predict the probability of survival, what risk factors for these predictions, what criteria for evaluating data mining techniques, and finally what data sources for it have been used to predict the survival of breast cancer patients. This review is based on the Prism statement consisting of published studies in the field of predicting the survival of breast cancer patients using data mining techniques from 2005 to 2018 in databases such as Medline, Science Direct, Web of Science, Embase data and Scopus. After searching in these databases, 527 articles were retrieved. After removing duplicates and evaluating the articles, 21 articles were used. The three techniques of logistic regression, decision tree, and support vector machine have been most used in articles. Age, tumor grade, tumor stage, and tumor size are used more than other risk factors. Among the criteria, the accuracy criterion was used in more studies. Most of the studies used the Surveillance, Epidemiology, and End Results Program (SEER) dataset. Typically, in the field of survival probability prediction, data mining techniques in the field of classification are given more attention due to their adaptation to this field. Accordingly, data mining techniques such as decision tree techniques, logistic regression, and support vector machine were used in more studies than other techniques. The use of these techniques can provide a good basis for clinicians to evaluate the effectiveness of different treatments and the impact of each of these methods on patientschr('39') longevity and survival. If the output of these techniques is used to provide the data input required by a decision support system, clinicians can provide risk factors related to the patient, the patientchr('39')s age, and the patientchr('39')s physical condition when providing services to breast cancer patients. Through the outputs provided by the decision support system, they provided the most optimal decision to choose the best treatment method and consequently increase patient survival.

Keywords