Deep-Learning Algorithm and Concomitant Biomarker Identification for NSCLC Prediction Using Multi-Omics Data Integration
Min-Koo Park,
Jin-Muk Lim,
Jinwoo Jeong,
Yeongjae Jang,
Ji-Won Lee,
Jeong-Chan Lee,
Hyungyu Kim,
Euiyul Koh,
Sung-Joo Hwang,
Hong-Gee Kim,
Keun-Cheol Kim
Affiliations
Min-Koo Park
Department of Biological Sciences, College of Natural Sciences, Kangwon National University, Chuncheon 24341, Republic of Korea
Jin-Muk Lim
Biomedical Knowledge Engineering Laboratory, School of Dentistry and Dental Research Institute, Seoul National University, Seoul 08826, Republic of Korea
Jinwoo Jeong
AI Institute, Alopax-Algo, Co., Ltd., Seoul 06978, Republic of Korea
Yeongjae Jang
Medical AI Team, Jonathan Wellcare Division, Acryl, Inc., Seoul 06069, Republic of Korea
Ji-Won Lee
Hugenebio Institute, Bio-Innovation Park, Erom, Inc., Chuncheon 24427, Republic of Korea
Jeong-Chan Lee
Hugenebio Institute, Bio-Innovation Park, Erom, Inc., Chuncheon 24427, Republic of Korea
Hyungyu Kim
Medical AI Team, Jonathan Wellcare Division, Acryl, Inc., Seoul 06069, Republic of Korea
Euiyul Koh
Medical AI Team, Jonathan Wellcare Division, Acryl, Inc., Seoul 06069, Republic of Korea
Sung-Joo Hwang
Integrated Medicine Institute, Loving Care Hospital, Seongnam 463400, Republic of Korea
Hong-Gee Kim
Biomedical Knowledge Engineering Laboratory, School of Dentistry and Dental Research Institute, Seoul National University, Seoul 08826, Republic of Korea
Keun-Cheol Kim
Department of Biological Sciences, College of Natural Sciences, Kangwon National University, Chuncheon 24341, Republic of Korea
Early diagnosis of lung cancer to increase the survival rate, which is currently at a low range of mid-30%, remains a critical need. Despite this, multi-omics data have rarely been applied to non-small-cell lung cancer (NSCLC) diagnosis. We developed a multi-omics data-affinitive artificial intelligence algorithm based on the graph convolutional network that integrates mRNA expression, DNA methylation, and DNA sequencing data. This NSCLC prediction model achieved a 93.7% macro F1-score, indicating that values for false positives and negatives were substantially low, which is desirable for accurate classification. Gene ontology enrichment and pathway analysis of features revealed that two major subtypes of NSCLC, lung adenocarcinoma and lung squamous cell carcinoma, have both specific and common GO biological processes. Numerous biomarkers (i.e., microRNA, long non-coding RNA, differentially methylated regions) were newly identified, whereas some biomarkers were consistent with previous findings in NSCLC (e.g., SPRR1B). Thus, using multi-omics data integration, we developed a promising cancer prediction algorithm.