Breast Cancer Research (Oct 2024)

Predicting pathologic complete response to neoadjuvant chemotherapy in breast cancer using a machine learning approach

  • Fangyuan Zhao,
  • Eric Polley,
  • Julian McClellan,
  • Frederick Howard,
  • Olufunmilayo I. Olopade,
  • Dezheng Huo

DOI
https://doi.org/10.1186/s13058-024-01905-7
Journal volume & issue
Vol. 26, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background For patients with breast cancer undergoing neoadjuvant chemotherapy (NACT), most of the existing prediction models of pathologic complete response (pCR) using clinicopathological features were based on standard statistical models like logistic regression, while models based on machine learning mostly utilized imaging data and/or gene expression data. This study aims to develop a robust and accessible machine learning model to predict pCR using clinicopathological features alone, which can be used to facilitate clinical decision-making in diverse settings. Methods The model was developed and validated within the National Cancer Data Base (NCDB, 2018–2020) and an external cohort at the University of Chicago (2010–2020). We compared logistic regression and machine learning models, and examined whether incorporating quantitative clinicopathological features improved model performance. Decision curve analysis was conducted to assess the model’s clinical utility. Results We identified 56,209 NCDB patients receiving NACT (pCR rate: 34.0%). The machine learning model incorporating quantitative clinicopathological features showed the best discrimination performance among all the fitted models [area under the receiver operating characteristic curve (AUC): 0.785, 95% confidence interval (CI): 0.778–0.792], along with outstanding calibration performance. The model performed best among patients with hormone receptor positive/human epidermal growth factor receptor 2 negative (HR+/HER2-) breast cancer (AUC: 0.817, 95% CI: 0.802–0.832); and by adopting a 7% prediction threshold, the model achieved 90.5% sensitivity and 48.8% specificity, with decision curve analysis finding a 23.1% net reduction in chemotherapy use. In the external testing set of 584 patients (pCR rate: 33.4%), the model maintained robust performance both overall (AUC: 0.711, 95% CI: 0.668–0.753) and in the HR+/HER2- subgroup (AUC: 0.810, 95% CI: 0.742–0.878). Conclusions The study developed a machine learning model ( https://huolab.cri.uchicago.edu/sample-apps/pcrmodel ) to predict pCR in breast cancer patients undergoing NACT that demonstrated robust discrimination and calibration performance. The model performed particularly well among patients with HR+/HER2- breast cancer, having the potential to identify patients who are less likely to achieve pCR and can consider alternative treatment strategies over chemotherapy. The model can also serve as a robust baseline model that can be integrated with smaller datasets containing additional granular features in future research.

Keywords