Evaluation of Different SNP Analysis Software and Optimal Mining Process in Tree Species
Mengjia Bu,
Mengxuan Xu,
Shentong Tao,
Peng Cui,
Bing He
Affiliations
Mengjia Bu
Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Mengxuan Xu
Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
Shentong Tao
Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China
Peng Cui
Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Bing He
Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Area, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
Single nucleotide polymorphism (SNP) is one of the most widely used molecular markers to help researchers understand the relationship between phenotypes and genotypes. SNP calling mainly consists of two steps, including read alignment and locus identification based on statistical models, and various software have been developed and applied in this issue. Meanwhile, in our study, very low agreement (<25%) was found among the prediction results generated by different software, which was much less consistent than expected. In order to obtain the optimal protocol of SNP mining in tree species, the algorithm principles of different alignment and SNP mining software were discussed in detail. And the prediction results were further validated based on in silico and experimental methods. In addition, hundreds of validated SNPs were provided along with some practical suggestions on program selection and accuracy improvement were provided, and we wish that these results could lay the foundation for the subsequent analysis of SNP mining.