Acta Scientiarum: Agronomy (Nov 2024)
Two-step genomic prediction using artificial neural networks - an effective strategy for reducing computational costs and increasing prediction accuracy
Abstract
Artificial neural networks (ANNs) are powerful nonparametric tools for estimating genomic breeding values (GEBVs) in genetic breeding. One significant advantage of ANNs is their ability to make predictions without requiring prior assumptions about data distribution or the relationship between genotype and phenotype. However, ANNs come with a high computational cost, and their predictions may be underestimated when including all molecular markers. This study proposes a two-step genomic prediction procedure using ANNs to address these challenges. Initially, molecular markers were selected either directly through Multivariate Adaptive Regression Splines (MARS) or indirectly based on their importance, identified through Boosting, considering the top 5, 20, and 50% of markers with the highest significance. Subsequently, the selected markers were employed for genomic prediction using ANNs. This approach was applied to two simulated traits: one with ten trait-controlling loci and heritability of 0.4 (Scenario SC1) and the other with 100 trait-controlling loci and a heritability of 0.2 (Scenario SC2). Comparisons were made between ANN predictions using marker selection and those without any marker selection. Reducing the number of markers proved to be an efficient strategy, resulting in improved accuracy, reduced mean squared error (MSE), and shorter adjustment times. The best ANN predictions were obtained with ten markers selected by MARS in SC1, and the top 5% most relevant markers selected using Boosting in SC2. As a result, in SC1, predictions using MARS achieved over a 31% increase in accuracy and a 90% reduction in MSE. In SC2, predictions using Boosting resulted in more than a 15% increase in accuracy and an 83% reduction in MSE. For both scenarios, computational time was up to ten times shorter with marker selection. Overall, the two-step prediction procedure emerged as an effective strategy for enhancing the computational and predictive performance of ANN models.
Keywords