Accounting for population structure in genetic studies of cystic fibrosis
Hanley Kingston,
Adrienne M. Stilp,
William Gordon,
Jai Broome,
Stephanie M. Gogarten,
Hua Ling,
John Barnard,
Shannon Dugan-Perez,
Patrick T. Ellinor,
Stacey Gabriel,
Soren Germer,
Richard A. Gibbs,
Namrata Gupta,
Kenneth Rice,
Albert V. Smith,
Michael C. Zody,
Scott M. Blackman,
Garry Cutting,
Michael R. Knowles,
Yi-Hui Zhou,
Margaret Rosenfeld,
Ronald L. Gibson,
Michael Bamshad,
Alison Fohner,
Elizabeth E. Blue
Affiliations
Hanley Kingston
Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA
Adrienne M. Stilp
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
William Gordon
Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA
Jai Broome
Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA
Stephanie M. Gogarten
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Hua Ling
Department of Genetic Medicine, Center for Inherited Disease Research, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
John Barnard
Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic, Cleveland, OH 44195, USA
Shannon Dugan-Perez
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
Patrick T. Ellinor
Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA 02124, USA; Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA 02114, USA
Stacey Gabriel
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Soren Germer
New York Genome Center, New York, NY 10013, USA
Richard A. Gibbs
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
Namrata Gupta
Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
Kenneth Rice
Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
Albert V. Smith
Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
Michael C. Zody
New York Genome Center, New York, NY 10013, USA
Scott M. Blackman
Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Garry Cutting
McKusick-Nathans Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Michael R. Knowles
Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Yi-Hui Zhou
Department of Biological Sciences, North Carolina State University, Raleigh, NC 27797, USA
Margaret Rosenfeld
Center for Clinical and Translational Research, Seattle Children’s Hospital, Seattle, WA 98105, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
Ronald L. Gibson
Center for Clinical and Translational Research, Seattle Children’s Hospital, Seattle, WA 98105, USA; Department of Pediatrics, University of Washington, Seattle, WA 98195, USA
Michael Bamshad
Department of Pediatrics, Division of Genetic Medicine, University of Washington, Seattle, WA 98195, USA; Center for Clinical and Translational Research, Seattle Children’s Hospital, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA
Alison Fohner
Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
Elizabeth E. Blue
Institute for Public Health Genetics, University of Washington, Seattle, WA 98195, USA; Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute for Precision Medicine, Seattle, WA 98195, USA; Corresponding author
Summary: CFTR F508del (c.1521_1523delCTT, p.Phe508delPhe) is the most common pathogenic allele underlying cystic fibrosis (CF), and its frequency varies in a geographic cline across Europe. We hypothesized that genetic variation associated with this cline is overrepresented in a large cohort (N > 5,000) of persons with CF who underwent whole-genome sequencing and that this pattern could result in spurious associations between variants correlated with both the F508del genotype and CF-related outcomes. Using principal-component (PC) analyses, we showed that variation in the CFTR region disproportionately contributes to a PC explaining a relatively high proportion of genetic variance. Variation near CFTR was correlated with population structure among persons with CF, and this correlation was driven by a subset of the sample inferred to have European ancestry. We performed genome-wide association studies comparing persons with CF with one versus two copies of the F508del allele; this allowed us to identify genetic variation associated with the F508del allele and to determine that standard PC-adjustment strategies eliminated the significant association signals. Our results suggest that PC adjustment can adequately prevent spurious associations between genetic variants and CF-related traits and are therefore effective tools to control for population structure even when population structure is confounded with disease severity and a common pathogenic variant.