Journal of the National Cancer Center (Sep 2024)
Deep learning model based on primary tumor to predict lymph node status in clinical stage IA lung adenocarcinoma: a multicenter study
Abstract
Objective: To develop a deep learning model to predict lymph node (LN) status in clinical stage IA lung adenocarcinoma patients. Methods: This diagnostic study included 1,009 patients with pathologically confirmed clinical stage T1N0M0 lung adenocarcinoma from two independent datasets (699 from Cancer Hospital of Chinese Academy of Medical Sciences and 310 from PLA General Hospital) between January 2005 and December 2019. The Cancer Hospital dataset was randomly split into a training cohort (559 patients) and a validation cohort (140 patients) to train and tune a deep learning model based on a deep residual network (ResNet). The PLA Hospital dataset was used as a testing cohort to evaluate the generalization ability of the model. Thoracic radiologists manually segmented tumors and interpreted high-resolution computed tomography (HRCT) features for the model. The predictive performance was assessed by area under the curves (AUCs), accuracy, precision, recall, and F1 score. Subgroup analysis was performed to evaluate the potential bias of the study population. Results: A total of 1,009 patients were included in this study; 409 (40.5%) were male and 600 (59.5%) were female. The median age was 57.0 years (inter-quartile range, IQR: 50.0–64.0). The deep learning model achieved AUCs of 0.906 (95% CI: 0.873–0.938) and 0.893 (95% CI: 0.857–0.930) for predicting pN0 disease in the testing cohort and a non-pure ground glass nodule (non-pGGN) testing cohort, respectively. No significant difference was detected between the testing cohort and the non-pGGN testing cohort (P = 0.622). The precisions of this model for predicting pN0 disease were 0.979 (95% CI: 0.963–0.995) and 0.983 (95% CI: 0.967–0.998) in the testing cohort and the non-pGGN testing cohort, respectively. The deep learning model achieved AUCs of 0.848 (95% CI: 0.798–0.898) and 0.831 (95% CI: 0.776–0.887) for predicting pN2 disease in the testing cohort and the non-pGGN testing cohort, respectively. No significant difference was detected between the testing cohort and the non-pGGN testing cohort (P = 0.657). The recalls of this model for predicting pN2 disease were 0.903 (95% CI: 0.870–0.936) and 0.931 (95% CI: 0.901–0.961) in the testing cohort and the non-pGGN testing cohort, respectively. Conclusions: The superior performance of the deep learning model will help to target the extension of lymph node dissection and reduce the ineffective lymph node dissection in early-stage lung adenocarcinoma patients.