Leveraging TOPMed imputation server and constructing a cohort-specific imputation reference panel to enhance genotype imputation among cystic fibrosis patients
Quan Sun,
Weifang Liu,
Jonathan D. Rosen,
Le Huang,
Rhonda G. Pace,
Hong Dang,
Paul J. Gallins,
Elizabeth E. Blue,
Hua Ling,
Harriet Corvol,
Lisa J. Strug,
Michael J. Bamshad,
Ronald L. Gibson,
Elizabeth W. Pugh,
Scott M. Blackman,
Garry R. Cutting,
Wanda K. O'Neal,
Yi-Hui Zhou,
Fred A. Wright,
Michael R. Knowles,
Jia Wen,
Yun Li
Affiliations
Quan Sun
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Weifang Liu
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Jonathan D. Rosen
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Le Huang
Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Rhonda G. Pace
Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Hong Dang
Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Paul J. Gallins
Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA
Elizabeth E. Blue
Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Brotman Baty Institute, Seattle, WA 98195, USA
Hua Ling
Center for Inherited Disease Research (CIDR), Johns Hopkins University, Baltimore, MD 21205, USA; McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Harriet Corvol
Sorbonne Université, Inserm, Centre de Recherche Saint-Antoine, Assistance Publique-Hôpitaux de Paris (APHP), Hôpital Trousseau, Service de Pneumologie Pédiatrique, Paris, France
Lisa J. Strug
Departments of Statistical Sciences and Computer Science and Division of Biostatistics, University of Toronto, Toronto, ON, Canada; Program in Genetics and Genome Biology and The Centre for Applied Genomics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
Michael J. Bamshad
Department of Pediatrics, University of Washington, Seattle, WA 98105, USA; Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Division of Genetic Medicine, Seattle Children's Hospital, Seattle, WA 98105, USA; Brotman Baty Institute, Seattle, WA 98195, USA
Ronald L. Gibson
Department of Pediatrics, University of Washington, Seattle, WA 98105, USA
Elizabeth W. Pugh
Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Scott M. Blackman
Division of Pediatric Endocrinology, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Garry R. Cutting
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA; Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21287, USA
Wanda K. O'Neal
Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Yi-Hui Zhou
Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
Fred A. Wright
Bioinformatics Research Center and Department of Statistics, North Carolina State University, Raleigh, NC 27695, USA; Department of Biological Sciences, North Carolina State University, Raleigh, NC 27695, USA
Michael R. Knowles
Marsico Lung Institute/UNC CF Research Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
Jia Wen
Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Corresponding author
Yun Li
Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Corresponding author
Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3–4.2 million genotyped markers to approximately 11–43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.