Journal of Inflammation Research (Apr 2025)

Development and Validation of Predictive Models for Inflammatory Bowel Disease Diagnosis: A Machine Learning and Nomogram-Based Approach

  • Dong R,
  • Wang Y,
  • Yao H,
  • Chen T,
  • Zhou Q,
  • Zhao B,
  • Xu J

Journal volume & issue
Vol. Volume 18
pp. 5115 – 5131

Abstract

Read online

Rongrong Dong,1,* Yiting Wang,2,* Han Yao,1 Taoran Chen,1 Qi Zhou,3 Bo Zhao,4 Jiancheng Xu1 1Department of Laboratory Medicine, First Hospital of Jilin University, Changchun, 130021, People’s Republic of China; 2Department of Laboratory Medicine, Second Hospital of Jilin University, Changchun, 130022, People’s Republic of China; 3Department of Pediatrics, First Hospital of Jilin University, Changchun, 130021, People’s Republic of China; 4Department of Laboratory Medicine, Meihekou Central Hospital, Meihekou, 135000, People’s Republic of China*These authors contributed equally to this workCorrespondence: Jiancheng Xu, Department of Laboratory Medicine, First Hospital of Jilin University, Xinmin Street, No. 1, Changchun City, 130021, People’s Republic of China, Tel +86-431-8878-2595, Fax +86-431-8878-6169, Email [email protected]: Inflammatory bowel disease (IBD) is a chronic, incurable gastrointestinal disease without a gold standard for diagnosis. This study aimed to develop predictive models for diagnosing IBD, Crohn’s disease (CD), and Ulcerative colitis (UC) by combining two approaches: machine learning (ML) and traditional nomogram models.Methods: Cohorts 1 and 2 comprised data from the UK Biobank (UKB), and the First Hospital of Jilin University, respectively, which represented the initial laboratory tests upon admission for 1135 and 237 CD patients, 2192 and 326 UC patients, and 1798 and 298 non-IBD patients. Cohorts 1 and 2 were used to create predictive models. The parameters of the machine learning model established by Cohorts 1 and 2 were merged, and nomogram models were developed using Logistic regression. Cohort 3 collected initial laboratory tests from 117 CD patients, 197 UC patients, and 241 non IBD patients at a tertiary hospital in different regions of China for external testing of three nomogram models.Results: For Cohort 1, ML-IBD-1, ML-CD-1 and ML-UC-1 models developed using the LightGBM algorithm demonstrated exceptional discrimination (ML-IBD-1: AUC = 0.788; ML-CD-1: AUC = 0.772; ML-UC-1: AUC = 0.841). For Cohort 2, ML-IBD-2, ML-CD-2, and ML-UC-2 models developed using XGBoost and Logistic Regression algorithms demonstrated exceptional discrimination (ML-IBD-2: AUC = 0.894; ML-CD-2: AUC = 0.932; ML-UC-2: AUC = 0.778). The nomogram model exhibits good diagnostic capability (nomogram-IBD: AUC=0.778, 95% CI (0.688– 0.868); nomogram-CD: AUC=0.744, 95% CI (0.710– 0.778); nomogram-UC, AUC=0.702, 95% CI (0.591– 0.814)). The predictive ability of the three models was validated in cohort 3 (nomogram-IBD: AUC=0.758, 95% CI (0.683– 0.832); nomogram-CD: AUC=0.791, 95% CI (0.717– 0.865); nomogram-UC, AUC=0.817, 95% CI (0.702– 0.932)).Conclusion: This study utilized three cohorts and developed risk prediction models for IBD, CD, and UC with good diagnostic capability, based on conventional laboratory data using ML and nomogram.Keywords: inflammatory bowel disease, Crohn’s disease, ulcerative colitis, machine learning, nomogram

Keywords