Frontiers in Genetics (Aug 2016)
A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice
Abstract
One objective of this study was to provide readers with a clear and unified understanding ofparametric statistical and kernel methods, used for genomic prediction, and to compare some ofthese in the context of rice breeding for quantitative traits. Furthermore, another objective wasto provide a simple and user-friendly R package, named KRMM, which allows users to performRKHS regression with several kernels. After introducing the concept of regularized empiricalrisk minimization, the connections between well-known parametric and kernel methods suchas Ridge regression (i.e. genomic best linear unbiased predictor (GBLUP)) and reproducingkernel Hilbert space (RKHS) regression were reviewed. Ridge regression was then reformulatedso as to show and emphasize the advantage of the kernel trick concept, exploited by kernelmethods in the context of epistatic genetic architectures, over parametric frameworks used byconventional methods. Some parametric and kernel methods; least absolute shrinkage andselection operator (LASSO), GBLUP, support vector machine regression (SVR) and RKHSregression were thereupon compared for their genomic predictive ability in the context of ricebreeding using three real data sets. Among the compared methods, RKHS regression and SVRwere often the most accurate methods for prediction followed by GBLUP and LASSO. An Rfunction which allows users to perform RR-BLUP of marker effects, GBLUP and RKHS regression,with a Gaussian, Laplacian, polynomial or ANOVA kernel, in a reasonable computation time hasbeen developed. Moreover, a modified version of this function, which allows users to tune kernelsfor RKHS regression, has also been developed and parallelized for HPC Linux clusters. The corresponding KRMM package and all scripts have been made publicly available.
Keywords