Scientific Reports (Sep 2024)
Explainable machine learning model for predicting paratracheal lymph node metastasis in cN0 papillary thyroid cancer
Abstract
Abstract Prophylactic dissection of paratracheal lymph nodes in clinically lymph node-negative (cN0) papillary thyroid carcinoma (PTC) remains controversial. This study aims to integrate preoperative and intraoperative variables to compare traditional nomograms and machine learning (ML) models, developing and validating an interpretable predictive model for paratracheal lymph node metastasis (PLNM) in cN0 PTC patients. We retrospectively selected 3213 PTC patients treated at the First Affiliated Hospital of Chongqing Medical University from 2016 to 2020. They were randomly divided into the training and test datasets with a 7:3 ratio. The 533 PTC patients treated at the Guangyuan Central Hospital from 2019 to 2022 were used as an external test sets. We developed and validated nine ML models using 10-fold cross-validation and grid search for hyperparameter tuning. The predictive performance was evaluated using ROC curves, decision curve analysis (DCA), calibration curves, and precision-recall curves. The best model was compared to a traditional logistic regression-based nomogram. The XGBoost model achieved AUC values of 0.935, 0.857, and 0.775 in the training, validation, and test sets, respectively, significantly outperforming the traditional nomogram model with AUCs of 0.85, 0.844, and 0.769, respectively. SHapley Additive exPlanations (SHAP)-based visualization identified the top 10 predictive features of the XGBoost model, and a web-based calculator was created based on these features. ML is a reliable tool for predicting PLNM in cN0 PTC patients. The SHAP method provides valuable insights into the XGBoost model, and the resultant web-based calculator is a clinically useful tool to assist in the surgical planning for paratracheal lymph node dissection.