The Plant Genome (Nov 2021)

Training set design in genomic prediction with multiple biparental families

  • Xintian Zhu,
  • Willmar L. Leiser,
  • Volker Hahn,
  • Tobias Würschum

DOI
https://doi.org/10.1002/tpg2.20124
Journal volume & issue
Vol. 14, no. 3
pp. n/a – n/a

Abstract

Read online

Abstract Genomic selection is a powerful tool to reduce the cycle length and enhance the genetic gain of complex traits in plant breeding. However, questions remain about the optimum design and composition of the training set. In this study, we used 944 soybean [Glycine max (L.) Merr.] recombinant inbred lines from eight families derived through a partial–diallel mating design among five parental lines. The cross‐validated prediction accuracies for the six traits seed yield, 1,000‐seed weight, protein yield, plant height, protein content, and oil content were high, ranging from 0.79 to 0.87. We investigated among‐family predictions, making use of the special mating design with different degrees of relatedness among families. Generally, the prediction accuracy decreased from full‐sibs to half‐sib families to unrelated families. However, half‐sib and unrelated families also showed substantial variation in their prediction accuracy for a given family, which appeared to be caused at least in part by the shared segregation of quantitative trait loci in both the training and prediction sets. Combining several half‐sib families in composite training sets generally led to an increase in the prediction accuracy compared with the best family alone. The prediction accuracy increased with the size of the training set, but for comparable prediction accuracy, substantially more half‐sibs were required than full‐sibs. Collectively, our results highlight the potential of genomic selection for soybean breeding and, in a broader context, illustrate the importance of the targeted design of the training set.