BMC Genomics (Oct 2021)

A linkage disequilibrium-based approach to position unmapped SNPs in crop species

  • Seema Yadav,
  • Elizabeth M. Ross,
  • Karen S. Aitken,
  • Lee T. Hickey,
  • Owen Powell,
  • Xianming Wei,
  • Kai P. Voss-Fels,
  • Ben J. Hayes

DOI
https://doi.org/10.1186/s12864-021-08116-w
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background High-density SNP arrays are now available for a wide range of crop species. Despite the development of many tools for generating genetic maps, the genome position of many SNPs from these arrays is unknown. Here we propose a linkage disequilibrium (LD)-based algorithm to allocate unassigned SNPs to chromosome regions from sparse genetic maps. This algorithm was tested on sugarcane, wheat, and barley data sets. We calculated the algorithm’s efficiency by masking SNPs with known locations, then assigning their position to the map with the algorithm, and finally comparing the assigned and true positions. Results In the 20-fold cross-validation, the mean proportion of masked mapped SNPs that were placed by the algorithm to a chromosome was 89.53, 94.25, and 97.23% for sugarcane, wheat, and barley, respectively. Of the markers that were placed in the genome, 98.73, 96.45 and 98.53% of the SNPs were positioned on the correct chromosome. The mean correlations between known and new estimated SNP positions were 0.97, 0.98, and 0.97 for sugarcane, wheat, and barley. The LD-based algorithm was used to assign 5920 out of 21,251 unpositioned markers to the current Q208 sugarcane genetic map, representing the highest density genetic map for this species to date. Conclusions Our LD-based approach can be used to accurately assign unpositioned SNPs to existing genetic maps, improving genome-wide association studies and genomic prediction in crop species with fragmented and incomplete genome assemblies. This approach will facilitate genomic-assisted breeding for many orphan crops that lack genetic and genomic resources.

Keywords