Crop Journal (Apr 2022)

A novel genomic prediction method combining randomized Haseman-Elston regression with a modified algorithm for Proven and Young for large genomic data

  • Hailan Liu,
  • Guo-Bo Chen

Journal volume & issue
Vol. 10, no. 2
pp. 550 – 554

Abstract

Read online

Computational efficiency has become a key issue in genomic prediction (GP) owing to the massive historical datasets accumulated. We developed hereby a new super-fast GP approach (SHEAPY) combining randomized Haseman-Elston regression (RHE-reg) with a modified Algorithm for Proven and Young (APY) in an additive-effect model, using the former to estimate heritability and then the latter to invert a large genomic relationship matrix for best linear prediction. In simulation results with varied sizes of training population, GBLUP, HEAPY|A and SHEAPY showed similar predictive performance when the size of a core population was half that of a large training population and the heritability was a fixed value, and the computational speed of SHEAPY was faster than that of GBLUP and HEAPY|A. In simulation results with varied heritability, SHEAPY showed better predictive ability than GBLUP in all cases and than HEAPY|A in most cases when the size of a core population was 4/5 that of a small training population and the training population size was a fixed value. As a proof of concept, SHEAPY was applied to the analysis of two real datasets. In an Arabidopsis thaliana F2 population, the predictive performance of SHEAPY was similar to or better than that of GBLUP and HEAPY|A in most cases when the size of a core population (200) was 2/3 of that of a small training population (300). In a sorghum multiparental population, SHEAPY showed higher predictive accuracy than HEAPY|A for all of three traits, and than GBLUP for two traits. SHEAPY may become the GP method of choice for large-scale genomic data.

Keywords