Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

Afrah Shafquat; Ronald G. Crystal; Jason G. Mezey

doi:10.1186/s12859-020-3387-z

BMC Bioinformatics (May 2020)

Identifying novel associations in GWAS by hierarchical Bayesian latent variable detection of differentially misclassified phenotypes

Afrah Shafquat,
Ronald G. Crystal,
Jason G. Mezey

Affiliations

Afrah Shafquat: Department of Computational Biology, Cornell University
Ronald G. Crystal: Department of Genetic Medicine, Weill Cornell Medicine
Jason G. Mezey: Department of Computational Biology, Cornell University

DOI: https://doi.org/10.1186/s12859-020-3387-z
Journal volume & issue: Vol. 21, no. 1
pp. 1 – 25

Abstract

Read online

Abstract Background Heterogeneity in the definition and measurement of complex diseases in Genome-Wide Association Studies (GWAS) may lead to misdiagnoses and misclassification errors that can significantly impact discovery of disease loci. While well appreciated, almost all analyses of GWAS data consider reported disease phenotype values as is without accounting for potential misclassification. Results Here, we introduce Phenotype Latent variable Extraction of disease misdiagnosis (PheLEx), a GWAS analysis framework that learns and corrects misclassified phenotypes using structured genotype associations within a dataset. PheLEx consists of a hierarchical Bayesian latent variable model, where inference of differential misclassification is accomplished using filtered genotypes while implementing a full mixed model to account for population structure and genetic relatedness in study populations. Through simulations, we show that the PheLEx framework dramatically improves recovery of the correct disease state when considering realistic allele effect sizes compared to existing methodologies designed for Bayesian recovery of disease phenotypes. We also demonstrate the potential of PheLEx for extracting new potential loci from existing GWAS data by analyzing bipolar disorder and epilepsy phenotypes available from the UK Biobank. From the PheLEx analysis of these data, we identified new candidate disease loci not previously reported for these datasets that have value for supplemental hypothesis generation. Conclusion PheLEx shows promise in reanalyzing GWAS datasets to provide supplemental candidate loci that are ignored by traditional GWAS analysis methodologies.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords