HGG Advances (Apr 2022)

Leveraging TOPMed imputation server and constructing a cohort-specific imputation reference panel to enhance genotype imputation among cystic fibrosis patients

  • Quan Sun,
  • Weifang Liu,
  • Jonathan D. Rosen,
  • Le Huang,
  • Rhonda G. Pace,
  • Hong Dang,
  • Paul J. Gallins,
  • Elizabeth E. Blue,
  • Hua Ling,
  • Harriet Corvol,
  • Lisa J. Strug,
  • Michael J. Bamshad,
  • Ronald L. Gibson,
  • Elizabeth W. Pugh,
  • Scott M. Blackman,
  • Garry R. Cutting,
  • Wanda K. O'Neal,
  • Yi-Hui Zhou,
  • Fred A. Wright,
  • Michael R. Knowles,
  • Jia Wen,
  • Yun Li

Journal volume & issue
Vol. 3, no. 2
p. 100090

Abstract

Read online

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3–4.2 million genotyped markers to approximately 11–43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

Keywords