Animal (Sep 2024)

Imputation strategies for low-coverage whole-genome sequencing data and their effects on genomic prediction and genome-wide association studies in pigs

  • X.Q. Wang,
  • L.G. Wang,
  • L.Y. Shi,
  • J.J. Tian,
  • M.Y. Li,
  • L.X. Wang,
  • F.P. Zhao

Journal volume & issue
Vol. 18, no. 9
p. 101258

Abstract

Read online

The uncertainty resulting from missing genotypes in low-coverage whole-genome sequencing (LCWGS) data complicates genotype imputation. The aim of this study is to find out an optimal strategy for accurately imputing LCWGS data and assess its effectiveness for genomic prediction (GP) and genome-wide association study (GWAS) on economically important traits of Large White pigs. The LCWGS data of 1 423 Large White pigs were imputed using three different strategies: (1) using the high-coverage whole-genome sequencing (HCWGS) of 30 key progenitors as the reference panel (Ref_LG); (2) mixing HCWGS of key progenitors with LCWGS (Mix_HLG) and (3) self-imputation in LCWGS (Within_LG). Additionally, to compare the imputation effects of LCWGS, we also imputed SNP chip data of 1 423 Large White pigs to the whole-genome sequencing level using the reference panel consisting of key progenitors (Ref_SNP). To evaluate effects of the imputed sequencing data, we compared the accuracies of GP and statistical power of GWAS for four reproductive traits based on the chip data, sequencing data imputed from chip data and LCWGS data using an optimal strategy. The average imputation accuracies of the Within_LG, Ref_LG and Mix_HLG were 0.9893, 0.9899 and 0.9875, respectively, which were higher than that of the Ref_SNP (0.8522). Using the imputed sequencing data from LCWGS with the Ref_LG imputation strategy, the accuracies of GP for four traits improved by approximately 0.31–1.04% compared to the chip data, and by 0.7–1.05% compared to the imputed sequencing data from chip data. Furthermore, by using the sequence data imputed from LCWGS with the Ref_LG, 18 candidate genes were identified to be associated with the four reproductive traits of interest in Large White pigs: total number of piglets born - EPC2, MBD5, ORC4 and ACVR2A; number of piglets born healthy - IKBKE; total litter weight of piglets born alive - HSPA13 and CPA1; gestation length - GTF2H5, ITGAV, NFE2L2, CALCRL, ITGA4, STAT1, HOXD10, MSTN, COL5A2 and STAT4. With the exception of EPC2, ORC4, ACVR2A and MSTN, others represent novel candidates. Our findings can provide a reference for the application of LCWGS data in livestock and poultry.

Keywords