Explaining the Genetic Causality for Complex Phenotype via Deep Association Kernel Learning
Feng Bao,
Yue Deng,
Mulong Du,
Zhiquan Ren,
Sen Wan,
Kenny Ye Liang,
Shaohua Liu,
Bo Wang,
Junyi Xin,
Feng Chen,
David C. Christiani,
Meilin Wang,
Qionghai Dai
Affiliations
Feng Bao
Department of Automation, Tsinghua University, Beijing 100084, China; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing 100084, China
Yue Deng
School of Astronautics, Beihang University, Beijing 100191, China; Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing 100191, China; Corresponding author
Mulong Du
Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Zhiquan Ren
Department of Automation, Tsinghua University, Beijing 100084, China
Sen Wan
Department of Automation, Tsinghua University, Beijing 100084, China
Kenny Ye Liang
Department of Automation, Tsinghua University, Beijing 100084, China
Shaohua Liu
School of Astronautics, Beihang University, Beijing 100191, China
Bo Wang
School of Astronautics, Beihang University, Beijing 100191, China
Junyi Xin
Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing 211166, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
Feng Chen
Department of Biostatistics, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China
David C. Christiani
Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA; Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
Meilin Wang
Department of Environmental Genomics, Jiangsu Key Laboratory of Cancer Biomarkers, Prevention and Treatment, Collaborative Innovation Center for Cancer Personalized Medicine, Nanjing Medical University, Nanjing 211166, China; Department of Genetic Toxicology, The Key Laboratory of Modern Toxicology of Ministry of Education, Center for Global Health, School of Public Health, Nanjing Medical University, Nanjing 211166, China; Corresponding author
Qionghai Dai
Department of Automation, Tsinghua University, Beijing 100084, China; Institute for Brain and Cognitive Sciences, Tsinghua University, Beijing 100084, China; Corresponding author
Summary: The genetic effect explains the causality from genetic mutations to the development of complex diseases. Existing genome-wide association study (GWAS) approaches are always built under a linear assumption, restricting their generalization in dissecting complicated causality such as the recessive genetic effect. Therefore, a sophisticated and general GWAS model that can work with different types of genetic effects is highly desired. Here, we introduce a deep association kernel learning (DAK) model to enable automatic causal genotype encoding for GWAS at pathway level. DAK can detect both common and rare variants with complicated genetic effects where existing approaches fail. When applied to four real-world GWAS datasets including cancers and schizophrenia, our DAK discovered potential casual pathways, including the association between dilated cardiomyopathy pathway and schizophrenia. The Bigger Picture: Genetic mutations cause complex diseases in many different ways. Comprehensively identifying the genetic causality can lead to valuable insights into the development and treatment of diseases. However, existing genome-wide association study (GWAS) approaches are always built under linear assumption and simple disease models, restricting their generalization in discovering the complicated causality. DAK (deep association kernel learning) is a GWAS method that is constructed in a deep-learning framework and can simultaneously identify multiple types of genetic causalities without any modifications to the model. For biological contributions, the proposed approach enables the understanding of non-linear, complex genetic causalities and improves functional studies of the disease; for computational contributions, our method unifies kernel learning and association analysis in a joint explainable deep-learning framework.