Scientific Reports (Feb 2024)

Linking gene expression to clinical outcomes in pediatric Crohn’s disease using machine learning

  • Kevin A. Chen,
  • Nina C. Nishiyama,
  • Meaghan M. Kennedy Ng,
  • Alexandria Shumway,
  • Chinmaya U. Joisa,
  • Matthew R. Schaner,
  • Grace Lian,
  • Caroline Beasley,
  • Lee-Ching Zhu,
  • Surekha Bantumilli,
  • Muneera R. Kapadia,
  • Shawn M. Gomez,
  • Terrence S. Furey,
  • Shehzad Z. Sheikh

DOI
https://doi.org/10.1038/s41598-024-52678-0
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Pediatric Crohn’s disease (CD) is characterized by a severe disease course with frequent complications. We sought to apply machine learning-based models to predict risk of developing future complications in pediatric CD using ileal and colonic gene expression. Gene expression data was generated from 101 formalin-fixed, paraffin-embedded (FFPE) ileal and colonic biopsies obtained from treatment-naïve CD patients and controls. Clinical outcomes including development of strictures or fistulas and progression to surgery were analyzed using differential expression and modeled using machine learning. Differential expression analysis revealed downregulation of pathways related to inflammation and extra-cellular matrix production in patients with strictures. Machine learning-based models were able to incorporate colonic gene expression and clinical characteristics to predict outcomes with high accuracy. Models showed an area under the receiver operating characteristic curve (AUROC) of 0.84 for strictures, 0.83 for remission, and 0.75 for surgery. Genes with potential prognostic importance for strictures (REG1A, MMP3, and DUOX2) were not identified in single gene differential analysis but were found to have strong contributions to predictive models. Our findings in FFPE tissue support the importance of colonic gene expression and the potential for machine learning-based models in predicting outcomes for pediatric CD.