Frontiers in Cell and Developmental Biology (Dec 2022)

A machine learning–Based model to predict early death among bone metastatic breast cancer patients: A large cohort of 16,189 patients

  • Fan Xiong,
  • Fan Xiong,
  • Xuyong Cao,
  • Xiaolin Shi,
  • Ze Long,
  • Yaosheng Liu,
  • Yaosheng Liu,
  • Mingxing Lei,
  • Mingxing Lei,
  • Mingxing Lei

DOI
https://doi.org/10.3389/fcell.2022.1059597
Journal volume & issue
Vol. 10

Abstract

Read online

Purpose: This study aims to develop a prediction model to categorize the risk of early death among breast cancer patients with bone metastases using machine learning models.Methods: This study examined 16,189 bone metastatic breast cancer patients between 2010 and 2019 from a large oncological database in the United States. The patients were divided into two groups at random in a 90:10 ratio. The majority of patients (n = 14,582, 90%) were served as the training group to train and optimize prediction models, whereas patients in the validation group (n = 1,607, 10%) were utilized to validate the prediction models. Four models were introduced in the study: the logistic regression model, gradient boosting tree model, decision tree model, and random forest model.Results: Early death accounted for 17.4% of all included patients. Multivariate analysis demonstrated that older age; a separated, divorced, or widowed marital status; nonmetropolitan counties; brain metastasis; liver metastasis; lung metastasis; and histologic type of unspecified neoplasms were significantly associated with more early death, whereas a lower grade, a positive estrogen receptor (ER) status, cancer-directed surgery, radiation, and chemotherapy were significantly the protective factors. For the purpose of developing prediction models, the 12 variables were used. Among all the four models, the gradient boosting tree had the greatest AUC [0.829, 95% confident interval (CI): 0.802–0.856], and the random forest (0.828, 95% CI: 0.801–0.855) and logistic regression (0.819, 95% CI: 0.791–0.847) models came in second and third, respectively. The discrimination slopes for the three models were 0.258, 0.223, and 0.240, respectively, and the corresponding accuracy rates were 0.801, 0.770, and 0.762, respectively. The Brier score of gradient boosting tree was the lowest (0.109), followed by the random forest (0.111) and logistic regression (0.112) models. Risk stratification showed that patients in the high-risk group (46.31%) had a greater six-fold chance of early death than those in the low-risk group (7.50%).Conclusion: The gradient boosting tree model demonstrates promising performance with favorable discrimination and calibration in the study, and this model can stratify the risk probability of early death among bone metastatic breast cancer patients.

Keywords