Ability of Genomic Prediction to Bi-Parent-Derived Breeding Population Using Public Data for Soybean Oil and Protein Content
Chenhui Li,
Qing Yang,
Bingqiang Liu,
Xiaolei Shi,
Zhi Liu,
Chunyan Yang,
Tao Wang,
Fuming Xiao,
Mengchen Zhang,
Ainong Shi,
Long Yan
Affiliations
Chenhui Li
College of Life Sciences, Hebei Agricultural University, Baoding 071001, China
Qing Yang
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Bingqiang Liu
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Xiaolei Shi
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Zhi Liu
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Chunyan Yang
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Tao Wang
Handan Academy of Agricultural Science, Handan 056001, China
Fuming Xiao
Handan Academy of Agricultural Science, Handan 056001, China
Mengchen Zhang
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Ainong Shi
Department of Horticulture, University of Arkansas, Fayetteville, AR 72701, USA
Long Yan
Hebei Laboratory of Crop Genetics and Breeding, National Soybean Improvement Center Shijiazhuang Sub-Center, Huang-Huai-Hai Key Laboratory of Biology and Genetic Improvement of Soybean, Ministry of Agriculture and Rural Affairs, Institute of Cereal and Oil Crops, Hebei Academy of Agricultural and Forestry Sciences, High-Tech Industrial Development Zone, 162 Hengshan St., Shijiazhuang 050035, China
Genomic selection (GS) is a marker-based selection method used to improve the genetic gain of quantitative traits in plant breeding. A large number of breeding datasets are available in the soybean database, and the application of these public datasets in GS will improve breeding efficiency and reduce time and cost. However, the most important problem to be solved is how to improve the ability of across-population prediction. The objectives of this study were to perform genomic prediction (GP) and estimate the prediction ability (PA) for seed oil and protein contents in soybean using available public datasets to predict breeding populations in current, ongoing breeding programs. In this study, six public datasets of USDA GRIN soybean germplasm accessions with available phenotypic data of seed oil and protein contents from different experimental populations and their genotypic data of single-nucleotide polymorphisms (SNPs) were used to perform GP and to predict a bi-parent-derived breeding population in our experiment. The average PA was 0.55 and 0.50 for seed oil and protein contents within the bi-parents population according to the within-population prediction; and 0.45 for oil and 0.39 for protein content when the six USDA populations were combined and employed as training sets to predict the bi-parent-derived population. The results showed that four USDA-cultivated populations can be used as a training set individually or combined to predict oil and protein contents in GS when using 800 or more USDA germplasm accessions as a training set. The smaller the genetic distance between training population and testing population, the higher the PA. The PA increased as the population size increased. In across-population prediction, no significant difference was observed in PA for oil and protein content among different models. The PA increased as the SNP number increased until a marker set consisted of 10,000 SNPs. This study provides reasonable suggestions and methods for breeders to utilize public datasets for GS. It will aid breeders in developing GS-assisted breeding strategies to develop elite soybean cultivars with high oil and protein contents.