Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies

Haohan Wang; Tianwei Yue; Jingkang Yang; Wei Wu; Eric P. Xing

doi:10.1186/s12859-019-3300-9

BMC Bioinformatics (Dec 2019)

Deep mixed model for marginal epistasis detection and population stratification correction in genome-wide association studies

Haohan Wang,
Tianwei Yue,
Jingkang Yang,
Wei Wu,
Eric P. Xing

Affiliations

Haohan Wang: Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Tianwei Yue: Language Technologies Institute, School of Computer Science, Carnegie Mellon University
Jingkang Yang: Department of Electrical and Computer Engineering, Rice University
Wei Wu: Computational Biology Department, School of Computer Science, Carnegie Mellon University
Eric P. Xing: Machine Learning Department, School of Computer Science, Carnegie Mellon University

DOI: https://doi.org/10.1186/s12859-019-3300-9
Journal volume & issue: Vol. 20, no. S23
pp. 1 – 11

Abstract

Read online

Abstract Background Genome-wide Association Studies (GWAS) have contributed to unraveling associations between genetic variants in the human genome and complex traits for more than a decade. While many works have been invented as follow-ups to detect interactions between SNPs, epistasis are still yet to be modeled and discovered more thoroughly. Results In this paper, following the previous study of detecting marginal epistasis signals, and motivated by the universal approximation power of deep learning, we propose a neural network method that can potentially model arbitrary interactions between SNPs in genetic association studies as an extension to the mixed models in correcting confounding factors. Our method, namely Deep Mixed Model, consists of two components: 1) a confounding factor correction component, which is a large-kernel convolution neural network that focuses on calibrating the residual phenotypes by removing factors such as population stratification, and 2) a fixed-effect estimation component, which mainly consists of an Long-short Term Memory (LSTM) model that estimates the association effect size of SNPs with the residual phenotype. Conclusions After validating the performance of our method using simulation experiments, we further apply it to Alzheimer’s disease data sets. Our results help gain some explorative understandings of the genetic architecture of Alzheimer’s disease.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords