Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling

Maryam Onifade; Marie-Hélène Roy-Gagnon; Marie-Élise Parent; Kelly M. Burkett

doi:10.1186/s12864-022-08297-y

BMC Genomics (Feb 2022)

Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling

Maryam Onifade,
Marie-Hélène Roy-Gagnon,
Marie-Élise Parent,
Kelly M. Burkett

Affiliations

Maryam Onifade: Department of Mathematics and Statistics
Marie-Hélène Roy-Gagnon: School of Epidemiology and Public Health
Marie-Élise Parent: Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique
Kelly M. Burkett: Department of Mathematics and Statistics

DOI: https://doi.org/10.1186/s12864-022-08297-y
Journal volume & issue: Vol. 23, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Mixed models are used to correct for confounding due to population stratification and hidden relatedness in genome-wide association studies. This class of models includes linear mixed models and generalized linear mixed models. Existing mixed model approaches to correct for population substructure have been previously investigated with both continuous and case-control response variables. However, they have not been investigated in the context of extreme phenotype sampling (EPS), where genetic covariates are only collected on samples having extreme response variable values. In this work, we compare the performance of existing binary trait mixed model approaches (GMMAT, LEAP and CARAT) on EPS data. Since linear mixed models are commonly used even with binary traits, we also evaluate the performance of a popular linear mixed model implementation (GEMMA). Results We used simulation studies to estimate the type I error rate and power of all approaches assuming a population with substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT control the type I error rate while CARAT’s rate remains inflated. We applied all methods to a real dataset from a Québec, Canada, case-control study that is known to have population substructure. We observe similar type I error control with the analysis on the Québec dataset. For rare variants, the false positive rate remains inflated even after correction with mixed model approaches. For methods that control the type I error rate, the estimated power is comparable. Conclusions The methods compared in this study differ in their type I error control. Therefore, when data are from an EPS study, care should be taken to ensure that the models underlying the methodology are suitable to the sampling strategy and to the minor allele frequency of the candidate SNPs.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal

Abstract

Keywords