Genetics Selection Evolution (May 2020)

Bayesian neural networks with variable selection for prediction of genotypic values

  • Giel H. H. van Bergen,
  • Pascal Duenk,
  • Cornelis A. Albers,
  • Piter Bijma,
  • Mario P. L. Calus,
  • Yvonne C. J. Wientjes,
  • Hilbert J. Kappen

DOI
https://doi.org/10.1186/s12711-020-00544-8
Journal volume & issue
Vol. 52, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Estimating the genetic component of a complex phenotype is a complicated problem, mainly because there are many allele effects to estimate from a limited number of phenotypes. In spite of this difficulty, linear methods with variable selection have been able to give good predictions of additive effects of individuals. However, prediction of non-additive genetic effects is challenging with the usual prediction methods. In machine learning, non-additive relations between inputs can be modeled with neural networks. We developed a novel method (NetSparse) that uses Bayesian neural networks with variable selection for the prediction of genotypic values of individuals, including non-additive genetic effects. Results We simulated several populations with different phenotypic models and compared NetSparse to genomic best linear unbiased prediction (GBLUP), BayesB, their dominance variants, and an additive by additive method. We found that when the number of QTL was relatively small (10 or 100), NetSparse had 2 to 28 percentage points higher accuracy than the reference methods. For scenarios that included dominance or epistatic effects, NetSparse had 0.0 to 3.9 percentage points higher accuracy for predicting phenotypes than the reference methods, except in scenarios with extreme overdominance, for which reference methods that explicitly model dominance had 6 percentage points higher accuracy than NetSparse. Conclusions Bayesian neural networks with variable selection are promising for prediction of the genetic component of complex traits in animal breeding, and their performance is robust across different genetic models. However, their large computational costs can hinder their use in practice.