Diagnostics (Dec 2021)

Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data

  • Soo-Kyung Park,
  • Sangsoo Kim,
  • Gi-Young Lee,
  • Sung-Yoon Kim,
  • Wan Kim,
  • Chil-Woo Lee,
  • Jong-Lyul Park,
  • Chang-Hwan Choi,
  • Sang-Bum Kang,
  • Tae-Oh Kim,
  • Ki-Bae Bang,
  • Jaeyoung Chun,
  • Jae-Myung Cha,
  • Jong-Pil Im,
  • Kwang-Sung Ahn,
  • Seon-Young Kim,
  • Dong-Il Park

DOI
https://doi.org/10.3390/diagnostics11122365
Journal volume & issue
Vol. 11, no. 12
p. 2365

Abstract

Read online

Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (n = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.

Keywords