Discover Oncology (Dec 2024)

Novel models by machine learning to predict the risk of cardiac disease-specific death in young patients with breast cancer

  • Yi Li,
  • Handong Li,
  • Xuan Ye,
  • Zhigang Zhu,
  • Yixuan Qiu

DOI
https://doi.org/10.1007/s12672-024-01676-9
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Background With the tremendous leap of various adjuvant therapies, breast cancer (BC)-related deaths have decreased significantly. Increasing attention was focused on the effect of cardiac disease on BC survivors, while limited existing population-based studies lay emphasis on the young age population. Method Data of BC patients aged less than 50 years was collected from the SEER database. A competing risk model was introduced to analyze the effects of clinicopathology variables on the cardiac disease-specific death (CDSD) risks of these patients. Further, an XGBoost prediction model was constructed to predict the risk of CDSD. Prediction performance was assessed using the receiver operating characteristic (ROC) analysis, area under the POC curve (AUC) values, calibration curves, decision curves, and confusion matrix, and SHapley Additive exPlanations (SHAP) were used to interpret the models. Results Our competing risk analysis proved that young BC patients with older age, low household income, non-metropolitan residential environment, black race, unmarried status, HR + subtype, higher T stage (T2-4), receiving chemotherapy, and non-surgery are under higher risk of CDSD. Further, five machine learning models were constructed to predict the CDSD risks of young BC patients, among which the XGBoost models showed the highest AUC value (train set: AUC = 0.846; test set: AUC = 0.836). The confusion matrix of the XGBoost model demonstrated that the sensitivity, specificity, and correction were 0.81, 0.94, and 0.94 for the train set, and 0.82, 0.95, and 0.96 for the test set, respectively. The SHAP graph indicated that median household income, marital status, race, and age at diagnosis were the top four strongest predictors. Conclusion Independent CDSD risk factors for young BC patients were identified, and machine-learning prognostic models were constructed to predict their CDSD risks. Our validation results indicated that the predicted probability of our XGBoost model agrees well with the actual CDSD risks, and it can help recognize high-risk populations and therefore develop effective cardioprotection strategies. Hopefully, our findings can support the growth of the new field of cardio-oncology.

Keywords