陆军军医大学学报 (Nov 2024)

Construction and validation of a prediction model for lymph node metastasis in early gastric cancer based on machine learning

  • MENG Xiangyong,
  • QIN Jiayi,
  • CHEN Wensheng

DOI
https://doi.org/10.16016/j.2097-0927.202403126
Journal volume & issue
Vol. 46, no. 21
pp. 2432 – 2442

Abstract

Read online

Objective To construct an optimal prediction model for lymph node metastasis (LNM) in early gastric cancer (EGC) using machine learning techniques and assess its predictive performance. Methods Clinical data of 433 EGC patients undergoing radical surgery in our hospital from January 2015 to December 2022 were collected. They were divided into a training set and a validation set in a 7 ∶3 ratio. LASSO regression was used to screen variables and multivariate logistic regression analysis was employed to identify independent risk factors for LNM in the EGC patients. Ten machine learning models were constructed using categorical boosting (Catboost), light gradient boosting machine (LightGBM), extreme gradient boosting machine (XGboost), random forest (RF), gradient boosting machine (GBM), neural networks (NNET), support vector machine (SVM), K nearest (KNN), Naive Bayes (NB) and Logistic regression. The predictive power of the above models was evaluated and compared in terms of accuracy, precision, recall, F1 score value, sensitivity, specificity, positive predictive rate, negative predictive rate, Kappa value, area value under the receiver operating characteristic curve (AUC), calibration curve, decision curve, and precision-recall curve. Finally, SHAP (SHapley Additive exPlanations) was applied to explain the contribution of each variable in the best model for the prediction outcomes. Results Depth of tumor invasion, lymphovascular invasion and smoking history were independent risk factors for LNM in the EGC patients. Catboost model obtained the best predictive performance, and had 5 performance indicators outperforming the other models in the training set, that is, an AUC value of 0.904 (95%CI 0.868~0.940), a F1-score of 0.633, a Brier score of 0.100, a negative predictive rate of 0.975, and a Kappa value of 0.520. Finally, calculating the SHAP values of Catboost revealed that the depth of tumor invasion and lymphovascular invasion were two key characteristic variables for predicting LNM. Conclusion The depth of tumor invasion of submucosal and lymphovascular invasion and smoking history are independent risk factors for LNM in early gastric cancer. ML can be used to predict LNM risk, the Catboost model has the best predictive performance and can provide guidance for clinical diagnosis and treatment decisions.

Keywords