Agriculture (Apr 2023)

Estimation of Error Variance in Genomic Selection for Ultrahigh Dimensional Data

  • Sayanti Guha Majumdar,
  • Anil Rai,
  • Dwijesh Chandra Mishra

DOI
https://doi.org/10.3390/agriculture13040826
Journal volume & issue
Vol. 13, no. 4
p. 826

Abstract

Read online

Estimation of error variance in the case of genomic selection is a necessary step to measure the accuracy of the genomic selection model. For genomic selection, whole-genome high-density marker data is used where the number of markers is always larger than the sample size. This makes it difficult to estimate the error variance because the ordinary least square estimation technique cannot be used in the case of datasets where the number of parameters is greater than the number of individuals (i.e., p > n). In this article, two existing methods, viz. Refitted Cross Validation (RCV) and kfold-RCV, were suggested for such cases. Moreover, by considering the limitations of the above methods, two new methods, viz. Bootstrap-RCV and Ensemble method, have been proposed. Furthermore, an R package “varEst” has been developed, which contains four different functions to implement these error variance estimation methods in the case of Least Absolute Shrinkage and Selection Operator (LASSO), Least Squares Regression (LSR) and Sparse Additive Models (SpAM). The performances of the algorithms have been evaluated using simulated and real datasets.

Keywords