Improving predictive ability in sparse testing designs in soybean populations

Reyna Persa; Caio Canella Vieira; Esteban Rios; Valerio Hoyos-Villegas; Carlos D. Messina; Daniel Runcie; Diego Jarquin

doi:10.3389/fgene.2023.1269255

Frontiers in Genetics (Nov 2023)

Improving predictive ability in sparse testing designs in soybean populations

Reyna Persa,
Caio Canella Vieira,
Esteban Rios,
Valerio Hoyos-Villegas,
Carlos D. Messina,
Daniel Runcie,
Diego Jarquin

Affiliations

Reyna Persa: Agronomy Department, University of Florida, Gainesville, FL, United States
Caio Canella Vieira: Crop, Soil, and Environmental Sciences, Bumpers College, University of Arkansas, Fayetteville, AR, United States
Esteban Rios: Agronomy Department, University of Florida, Gainesville, FL, United States
Valerio Hoyos-Villegas: Department of Plant Science, McGill University, Montreal, QC, Canada
Carlos D. Messina: Horticultural Sciences Department, University of Florida, Gainesville, FL, United States
Daniel Runcie: Department of Plant Sciences, University of California Davis, Davis, CA, United States
Diego Jarquin: Agronomy Department, University of Florida, Gainesville, FL, United States

DOI: https://doi.org/10.3389/fgene.2023.1269255
Journal volume & issue: Vol. 14

Abstract

Read online

The availability of high-dimensional genomic data and advancements in genome-based prediction models (GP) have revolutionized and contributed to accelerated genetic gains in soybean breeding programs. GP-based sparse testing is a promising concept that allows increasing the testing capacity of genotypes in environments, of genotypes or environments at a fixed cost, or a substantial reduction of costs at a fixed testing capacity. This study represents the first attempt to implement GP-based sparse testing in soybeans by evaluating different training set compositions going from non-overlapped RILs until almost the other extreme of having same set of genotypes observed across environments for different training set sizes. A total of 1,755 recombinant inbred lines (RILs) tested in nine environments were used in this study. RILs were derived from 39 bi-parental populations of the Soybean Nested Association Mapping (NAM) project. The predictive abilities of various models and training set sizes and compositions were investigated. Training compositions included a range of ratios of overlapping (O-RILs) and non-overlapping (NO-RILs) RILs across environments, as well as a methodology to maximize or minimize the genetic diversity in a fixed-size sample. Reducing the training set size compromised predictive ability in most training set compositions. Overall, maximizing the genetic diversity within the training set and the inclusion of O-RILs increased prediction accuracy given a fixed training set size; however, the most complex model was less affected by these factors. More testing environments in the early stages of the breeding pipeline can provide a more comprehensive assessment of genotype stability and adaptation which are fundamental for the precise selection of superior genotypes adapted to a wide range of environments.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords