npj Genomic Medicine (Dec 2022)

A robust pipeline for ranking carrier frequencies of autosomal recessive and X-linked Mendelian disorders

  • Wenjuan Zhu,
  • Chen Wang,
  • Nandita Mullapudi,
  • Yanan Cao,
  • Lin Li,
  • Ivan Fai Man Lo,
  • Stephen Kwok-Wing Tsui,
  • Xiao Chen,
  • Yong Lei,
  • Shen Gu

DOI
https://doi.org/10.1038/s41525-022-00344-7
Journal volume & issue
Vol. 7, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Single gene disorders are individually rare but collectively common leading causes of neonatal and pediatric morbidity and mortality. Both parents or the mothers of affected individuals with autosomal recessive or X-linked recessive diseases, respectively, are carrier(s). Carrier frequencies of recessive diseases can vary drastically among different ethnicities. This study established a robust pipeline for estimating and ranking carrier frequencies of all known 2699 recessive genes based on genome-wide sequencing data in healthy individuals. The discovery gnomAD cohort contained sequencing data on 76,156 genomes and 125,748 exomes from individuals with seven ethnicity backgrounds. The three validation cohorts composed of the SG10K Project with 4810 genomes on East Asian and South Asian, the ChinaMAP project with 10,588 Chinese genomes, and the WBBC pilot project with 4480 Chinese genomes. Within each cohort, comprehensive selection criteria for various kinds of deleterious variants were instituted, including known pathogenic variants (Type 1), presumably loss-of-function changes (Type 2), predicted deleterious missense variants (Type 3), and potentially harmful in-frame INDELs (Type 4). Subsequently, carrier frequencies of the 2699 genes were calculated and ranked based on ethnicity-specific carrier rates of Type 1 to Type 4 variants. Comparison of results from different cohorts with similar ethnicity background exhibited high degree of correlation, particularly between the ChinaMAP and the WBBC cohorts (Pearson correlation coefficient R = 0.92), confirming the validity of our variant selection criteria and the overall analysis pipeline.