Population stratification in the context of diverse epidemiologic surveys sans genome-wide data

Matthew T. Oetjens; Kristin eBrown-Gentry; Robert eGoodloe; Holli H. Dilks; Dana C. Crawford

doi:10.3389/fgene.2016.00076

Frontiers in Genetics (May 2016)

Population stratification in the context of diverse epidemiologic surveys sans genome-wide data

Matthew T. Oetjens,
Kristin eBrown-Gentry,
Robert eGoodloe,
Holli H. Dilks,
Dana C. Crawford

Affiliations

Matthew T. Oetjens: Vanderbilt University
Kristin eBrown-Gentry: Vanderbilt University
Robert eGoodloe: Vanderbilt University
Holli H. Dilks: Sarah Cannon Research Institute
Dana C. Crawford: Case Western Reserve University

DOI: https://doi.org/10.3389/fgene.2016.00076
Journal volume & issue: Vol. 7

Abstract

Read online

Population stratification or confounding by genetic ancestry is a potential cause of false associations in genetic association studies. Estimation of and adjustment for genetic ancestry has become common practice thanks in part to the availability of ancestry informative markers on genome-wide association study (GWAS) arrays. While array data is now widespread, these data are not ubiquitous as several large epidemiologic and clinic-based studies lack genome-wide data. One such large epidemiologic-based study lacking genome-wide data accessible to investigators is the National Health and Nutrition Examination Surveys (NHANES), population-based cross-sectional surveys of Americans linked to demographic, health, and lifestyle data conducted by the Centers for Disease Control and Prevention. DNA samples (n=14,998) were extracted from biospecimens from consented NHANES participants between 1991-1994 (NHANES III, phase 2) and 1999-2002 and represent three major self-identified racial/ethnic groups: non-Hispanic whites (n=6,634), non-Hispanic blacks (n=3,458), and Mexican Americans (n=3,950). We as the Epidemiologic Architecture for Genes Linked to Environment (EAGLE) study genotyped candidate gene and GWAS-identified index variants in NHANES as part of the larger Population Architecture using Genomics and Epidemiology (PAGE) I study for collaborative genetic association studies. To enable basic quality control such as estimation of genetic ancestry to control for population stratification in NHANES san genome-wide data, we outline here strategies that use limited genetic data to identify the markers optimal for characterizing genetic ancestry. From among 411 and 295 autosomal SNPs available in NHANES III and NHANES 1999-2002, we demonstrate that markers with ancestry information can be identified to estimate global ancestry. Despite limited resolution, global genetic ancestry is highly correlated with self-identified race for the majority of participants, although less so for ethnicity. Overall, the strategies outlined here for a large epidemiologic study can be applied to other datasets accessible for genotype phenotype studies but are sans genome-wide data.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords