Genetics Selection Evolution (Jan 2024)
A cautionary tale of low-pass sequencing and imputation with respect to haplotype accuracy
Abstract
Abstract Background Low-pass whole-genome sequencing and imputation offer significant cost savings, enabling substantial increases in sample size and statistical power. This approach is particularly promising in livestock breeding, providing an affordable means of screening individuals for deleterious alleles or calculating genomic breeding values. Consequently, it may also be of value in companion animal genomics to support pedigree breeding. We sought to evaluate in dogs the impact of low coverage sequencing and reference-guided imputation on genotype concordance and association analyses. Results DNA isolated from saliva of 30 Labrador retrievers was sequenced at low (0.9X and 3.8X) and high (43.5X) coverage, and down-sampled from 43.5X to 9.6X and 17.4X. Genotype imputation was performed using a diverse reference panel (1021 dogs), and two subsets of the former panel (256 dogs each) where one had an excess of Labrador retrievers relative to other breeds. We observed little difference in imputed genotype concordance between reference panels. Association analyses for a locus acting as a disease proxy were performed using single-marker (GEMMA) and haplotype-based (XP-EHH) tests. GEMMA results were highly correlated (r ≥ 0.97) between 43.5X and ≥ 3.8X depths of coverage, while for 0.9X the correlation was lower (r ≤ 0.8). XP-EHH results were less well correlated, with r ranging from 0.58 (0.9X) to 0.88 (17.4X). Across a random sample of 10,000 genomic regions averaging 17 kb in size, we observed a median of three haplotypes per dog across the sequencing depths, with 5% of the regions returning more than eight haplotypes. Inspection of one such region revealed genotype and phasing inconsistencies across sequencing depths. Conclusions We demonstrate that saliva-derived canine DNA is suitable for whole-genome sequencing, highlighting the feasibility of client-based sampling. Low-pass sequencing and imputation require caution as incorrect allele assignments result when the subject possesses alleles that are absent in the reference panel. Larger panels have the capacity for greater allelic diversity, which should reduce the potential for imputation error. Although low-pass sequencing can accurately impute allele dosage, we highlight issues with phasing accuracy that impact haplotype-based analyses. Consequently, if accurately phased genotypes are required for analyses, we advocate sequencing at high depth (> 20X).