BMC Bioinformatics (Jan 2022)

Effective identification of varieties by nucleotide polymorphisms and its application for essentially derived variety identification in rice

  • Xiong Yuan,
  • Zirong Li,
  • Liwen Xiong,
  • Sufeng Song,
  • Xingfei Zheng,
  • Zhonghai Tang,
  • Zheming Yuan,
  • Lanzhi Li

DOI
https://doi.org/10.1186/s12859-022-04562-9
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Background Plant variety identification is the one most important of agricultural systems. Development of DNA marker profiles of released varieties to compare with candidate variety or future variety is required. However, strictly speaking, scientists did not use most existing variety identification techniques for “identification” but for “distinction of a limited number of cultivars,” of which generalization ability always not be well estimated. Because many varieties have similar genetic backgrounds, even some essentially derived varieties (EDVs) are involved, which brings difficulties for identification and breeding progress. A fast, accurate variety identification method, which also has good performance on EDV determination, needs to be developed. Results In this study, with the strategy of “Divide and Conquer,” a variety identification method Conditional Random Selection (CRS) method based on SNP of the whole genome of 3024 rice varieties was developed and be applied in essentially derived variety (EDV) identification of rice. CRS is a fast, efficient, and automated variety identification method. Meanwhile, in practical, with the optimal threshold of identity score searched in this study, the set of SNP (including 390 SNPs) showed optimal performance on EDV and non-EDV identification in two independent testing datasets. Conclusion This approach first selected a minimal set of SNPs to discriminate non-EDVs in the 3000 Rice Genome Project, then united several simplified SNP sets to improve its generalization ability for EDV and non-EDV identification in testing datasets. The results suggested that the CRS method outperformed traditional feature selection methods. Furthermore, it provides a new way to screen out core SNP loci from the whole genome for DNA fingerprinting of crop varieties and be useful for crop breeding.

Keywords