Scientific Reports (Jul 2024)

AI models predicting breast cancer distant metastasis using LightGBM with clinical blood markers and ultrasound maximum diameter

  • Yang Tan,
  • Wen-hai Zhang,
  • Zhen Huang,
  • Qi-xing Tan,
  • Yue-mei Zhang,
  • Chang-yuan Wei,
  • Zhen-Bo Feng

DOI
https://doi.org/10.1038/s41598-024-66658-x
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Breast cancer metastasis significantly impacts women's health globally. This study aimed to construct predictive models using clinical blood markers and ultrasound data to predict distant metastasis in breast cancer patients, ensuring clinical applicability, cost-effectiveness, relative non-invasiveness, and accessibility of these models. Analysis was conducted on data from 416 patients across two centers, focusing on clinical blood markers (tumor markers, liver and kidney function indicators, blood lipid markers, cardiovascular biomarkers) and maximum lesion diameter from ultrasound. Feature reduction was performed using Spearman correlation and LASSO regression. Two models were built using LightGBM: a clinical model (using clinical blood markers) and a combined model (incorporating clinical blood markers and ultrasound features), validated in training, internal test, and external validation (test1) cohorts. Feature importance analysis was conducted for both models, followed by univariate and multivariate regression analyses of these features. The AUC values of the clinical model in the training, internal test, and external validation (test1) cohorts were 0.950, 0.795, and 0.883, respectively. The combined model showed AUC values of 0.955, 0.835, and 0.918 in the training, internal test, and external validation (test1) cohorts, respectively. Clinical utility curve analysis indicated the combined model's superior net benefit in identifying breast cancer with distant metastasis across all cohorts. This suggests the combined model's superior discriminatory ability and strong generalization performance. Creatine kinase isoenzyme (CK-MB), CEA, CA153, albumin, creatine kinase, and maximum lesion diameter from ultrasound played significant roles in model prediction. CA153, CK-MB, lipoprotein (a), and maximum lesion diameter from ultrasound positively correlated with breast cancer distant metastasis, while indirect bilirubin and magnesium ions showed negative correlations. This study successfully utilized clinical blood markers and ultrasound data to develop AI models for predicting distant metastasis in breast cancer. The combined model, incorporating clinical blood markers and ultrasound features, exhibited higher accuracy, suggesting its potential clinical utility in predicting and identifying breast cancer distant metastasis. These findings highlight the potential prospects of developing cost-effective and accessible predictive tools in clinical oncology.