Frontiers in Genetics (Nov 2023)
Improving predictive ability in sparse testing designs in soybean populations
Abstract
The availability of high-dimensional genomic data and advancements in genome-based prediction models (GP) have revolutionized and contributed to accelerated genetic gains in soybean breeding programs. GP-based sparse testing is a promising concept that allows increasing the testing capacity of genotypes in environments, of genotypes or environments at a fixed cost, or a substantial reduction of costs at a fixed testing capacity. This study represents the first attempt to implement GP-based sparse testing in soybeans by evaluating different training set compositions going from non-overlapped RILs until almost the other extreme of having same set of genotypes observed across environments for different training set sizes. A total of 1,755 recombinant inbred lines (RILs) tested in nine environments were used in this study. RILs were derived from 39 bi-parental populations of the Soybean Nested Association Mapping (NAM) project. The predictive abilities of various models and training set sizes and compositions were investigated. Training compositions included a range of ratios of overlapping (O-RILs) and non-overlapping (NO-RILs) RILs across environments, as well as a methodology to maximize or minimize the genetic diversity in a fixed-size sample. Reducing the training set size compromised predictive ability in most training set compositions. Overall, maximizing the genetic diversity within the training set and the inclusion of O-RILs increased prediction accuracy given a fixed training set size; however, the most complex model was less affected by these factors. More testing environments in the early stages of the breeding pipeline can provide a more comprehensive assessment of genotype stability and adaptation which are fundamental for the precise selection of superior genotypes adapted to a wide range of environments.
Keywords