BMC Medicine (Apr 2025)

Machine learning technique-based four-autoantibody test for early detection of esophageal squamous cell carcinoma: a multicenter, retrospective study with a nested case–control study

  • Yi-Wei Xu,
  • Yu-Hui Peng,
  • Can-Tong Liu,
  • Hao Chen,
  • Ling-Yu Chu,
  • Hai-Lu Chen,
  • Zhi-Yong Wu,
  • Wen-Qiang Wei,
  • Li-Yan Xu,
  • Fang-Cai Wu,
  • En-Min Li

DOI
https://doi.org/10.1186/s12916-025-04066-2
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Autoantibodies represent promising diagnostic blood-based biomarkers that may be generated prior to the first clinically detectable signs of cancers. In present study, we aimed to identify a novel optimized autoantibody panel with high diagnostic accuracy for clinical and preclinical esophageal squamous cell carcinoma (ESCC) using machine learning (ML) algorithms. Methods We identified potential autoantibodies against tumor-associated antigens with serological proteome analysis. Serum autoantibody levels were measured by ELISA. Using a training set (n = 531), 102 models based on ML algorithms were constructed, and Partial Least Squares Generalized Linear Models (plsRglm) was selected out using receiver operating characteristics (ROC), Kolmogorov–Smirnov (K-S) test, and Population Stability Index (PSI), and further validated through an internal validation set (n = 413), external validation set 1 (n = 371), and external validation set 2 (n = 202). Then, we validated the ability of plsRglm model in predicting preclinical ESCC by a nested case–control study (24 preclinical ESCCs and 112 matched controls) within a population-based prospective cohort study. Results ROC analysis, K-S test, and PSI showed that plsRglm model based on four autoantibodies (ALDOA, ENO1, p53, and NY-ESO-1) exhibited the better diagnostic performance and robustness, which provided a high diagnostic accuracy in diagnosing ESCC with the respective AUCs (sensitivities and specificities) of 0.860 (68.8% and 90.4%) in the training set, 0.826 (65.3% and 89.1%) in the internal validation set, and 0.851 (69.2% and 87.3%) in the external validation set 1. For early-stage ESCC, this signature also maintained diagnostic performance [0.817 (62.3% and 90.4%) in the training set; 0.842 (62.5% and 89.1%) in the internal validation set; 0.854 (63.2% and 87.3%) in the external validation set 1; and 0.850 (67.3% and 90.1%) in the external validation set 2]. In the nested case–control study, this plsRglm model could detect the presence of preclinical ESCC with the AUC of 0.723, sensitivity of 54.2%, and specificity of 86.6%. Conclusions Our findings indicated that the plsRglm model based on four autoantibodies might help identify preclinical and early-stage ESCC.

Keywords