Scientific Reports (May 2024)

Automated machine learning for predicting liver metastasis in patients with gastrointestinal stromal tumor: a SEER-based analysis

  • Luojie Liu,
  • Rufa Zhang,
  • Ying Shi,
  • Jinbing Sun,
  • Xiaodan Xu

DOI
https://doi.org/10.1038/s41598-024-62311-9
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Gastrointestinal stromal tumors (GISTs) are a rare type of tumor that can develop liver metastasis (LIM), significantly impacting the patient's prognosis. This study aimed to predict LIM in GIST patients by constructing machine learning (ML) algorithms to assist clinicians in the decision-making process for treatment. Retrospective analysis was performed using the Surveillance, Epidemiology, and End Results (SEER) database, and cases from 2010 to 2015 were assigned to the developing sets, while cases from 2016 to 2017 were assigned to the testing set. Missing values were addressed using the multiple imputation technique. Four algorithms were utilized to construct the models, comprising traditional logistic regression (LR) and automated machine learning (AutoML) analysis such as gradient boost machine (GBM), deep neural net (DL), and generalized linear model (GLM). We evaluated the models' performance using LR-based metrics, including the area under the receiver operating characteristic curve (AUC), calibration curve, and decision curve analysis (DCA), as well as AutoML-based metrics, such as feature importance, SHapley Additive exPlanation (SHAP) Plots, and Local Interpretable Model Agnostic Explanation (LIME). A total of 6207 patients were included in this study, with 2683, 1780, and 1744 patients allocated to the training, validation, and test sets, respectively. Among the different models evaluated, the GBM model demonstrated the highest performance in the training, validation, and test cohorts, with respective AUC values of 0.805, 0.780, and 0.795. Furthermore, the GBM model outperformed other AutoML models in terms of accuracy, achieving 0.747, 0.700, and 0.706 in the training, validation, and test cohorts, respectively. Additionally, the study revealed that tumor size and tumor location were the most significant predictors influencing the AutoML model's ability to accurately predict LIM. The AutoML model utilizing the GBM algorithm for GIST patients can effectively predict the risk of LIM and provide clinicians with a reference for developing individualized treatment plans.

Keywords