Development of a Machine Learning Model to Distinguish between Ulcerative Colitis and Crohn’s Disease Using RNA Sequencing Data
Soo-Kyung Park,
Sangsoo Kim,
Gi-Young Lee,
Sung-Yoon Kim,
Wan Kim,
Chil-Woo Lee,
Jong-Lyul Park,
Chang-Hwan Choi,
Sang-Bum Kang,
Tae-Oh Kim,
Ki-Bae Bang,
Jaeyoung Chun,
Jae-Myung Cha,
Jong-Pil Im,
Kwang-Sung Ahn,
Seon-Young Kim,
Dong-Il Park
Affiliations
Soo-Kyung Park
Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea
Sangsoo Kim
Department of Bioinformatics, Soongsil University, Seoul 06978, Korea
Gi-Young Lee
Department of Bioinformatics, Soongsil University, Seoul 06978, Korea
Sung-Yoon Kim
Department of Bioinformatics, Soongsil University, Seoul 06978, Korea
Wan Kim
Department of Bioinformatics, Soongsil University, Seoul 06978, Korea
Chil-Woo Lee
Medical Research Institute, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea
Jong-Lyul Park
Personalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
Chang-Hwan Choi
Department of Internal Medicine, College of Medicine, Chung-Ang University, Seoul 04388, Korea
Sang-Bum Kang
Department of Internal Medicine, College of Medicine, Daejeon St. Mary’s Hospital, The Catholic University of Korea, Daejeon 34943, Korea
Tae-Oh Kim
Department of Internal Medicine, Haeundae Paik Hospital, Inje University College of Medicine, Busan 48108, Korea
Ki-Bae Bang
Department of Internal Medicine, Dankook University College of Medicine, Cheonan 31116, Korea
Jaeyoung Chun
Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul 06273, Korea
Jae-Myung Cha
Department of Internal Medicine, Kyung Hee University Hospital at Gang Dong, Kyung Hee University College of Medicine, Seoul 05278, Korea
Jong-Pil Im
Department of Internal Medicine and Liver Research Institute, College of Medicine, Seoul National University, Seoul 03080, Korea
Kwang-Sung Ahn
Functional Genome Institute, PDXen Biosystems Inc., Daejeon 34129, Korea
Seon-Young Kim
Personalized Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Korea
Dong-Il Park
Division of Gastroenterology, Department of Internal Medicine and Inflammatory Bowel Disease Center, Kangbuk Samsung Hospital, Sungkyunkwan University School of Medicine, Seoul 03181, Korea
Crohn’s disease (CD) and ulcerative colitis (UC) can be difficult to differentiate. As differential diagnosis is important in establishing a long-term treatment plan for patients, we aimed to develop a machine learning model for the differential diagnosis of the two diseases using RNA sequencing (RNA-seq) data from endoscopic biopsy tissue from patients with inflammatory bowel disease (n = 127; CD, 94; UC, 33). Biopsy samples were taken from inflammatory lesions or normal tissues. The RNA-seq dataset was processed via mapping to the human reference genome (GRCh38) and quantifying the corresponding gene models that comprised 19,596 protein-coding genes. An unsupervised learning model showed distinct clusters of four classes: CD inflammatory, CD normal, UC inflammatory, and UC normal. A supervised learning model based on partial least squares discriminant analysis was able to distinguish inflammatory CD from inflammatory UC after pruning the strong classifiers of normal CD vs. normal UC. The error rate was minimal and affected only two components: 20 and 50 genes for the first and second components, respectively. The corresponding overall error rate was 0.147. RNA-seq analysis of tissue and the two components revealed in this study may be helpful for distinguishing CD from UC.