BMC Genomics (Nov 2024)
Identifying low-density, ancestry-informative SNP markers through whole genome resequencing in Indian, Chinese, and wild yak
Abstract
Abstract The current investigation was undertaken to elucidate the population-stratifying and ancestry-informative markers in Indian, Chinese, and wild yak populations using whole genome resequencing (WGS) analysis while employing various selection strategies (Delta, Pairwise Wright’s Fixation Index - F ST , and Informativeness of Assignment) and marker densities (5–25 thousand). The study used WGS data on 105 individuals from three separate yak cohorts i.e., Indian yak (n = 29), Chinese yak (n = 61), and wild yak (n = 15). Variant calling in the GATK program with strict quality control resulted in 1,002,970 high-quality and independent (LD-pruned) SNP markers across the yak autosomes. Analysis was undertaken in toolbox for ranking and evaluation of SNPs (TRES) program wherein three different criteria i.e., Delta, Pairwise Wright’s Fixation Index-F ST , and Informativeness of Assignment were employed to identify population-stratifying and ancestry-informative markers across various datasets. The top-ranked 5,000 (5K), 10,000 (10K), 15,000 (15K), 20,000 (20K), and 25,000 (25K) SNPs were identified from each dataset while their composition and performance was assessed using different criteria. The average genomic breed clustering of Indian, Chinese, and wild yak cohorts with full density dataset (105 individuals with 1,002,970 markers) was 81.74%, 80.02%, and 83.62%, respectively. Informativeness of Assignment criterion with 10K density emerged as the best combination for three yak cohorts with 86.94%, 96.46%, and 98.20% clustering for Indian, Chinese, and wild yak, respectively. There was an average increase of 7.56%, 22.72%, and 30.35% in genomic breed clustering scores of Indian, Chinese, and wild yak cohorts over the estimates of the original dataset. The selected markers showed overlap multiple protein-coding genes within a 10 kb window including ADGRB3, ANK1, CACNG7, CALN1, CHCHD2, CREBBP, GLI3, KHDRBS2, and OSBPL10. This is the first report ever on elucidating low-density SNP marker sets with population-stratifying and ancestry-informative properties in three yak groups using WGS data. The results gain significance for application of genomic selection using cost-effective low-density SNP panels in global yak species.
Keywords