Biology (Nov 2022)

Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization

  • Mang Liang,
  • Bingxing An,
  • Keanning Li,
  • Lili Du,
  • Tianyu Deng,
  • Sheng Cao,
  • Yueying Du,
  • Lingyang Xu,
  • Xue Gao,
  • Lupei Zhang,
  • Junya Li,
  • Huijiang Gao

DOI
https://doi.org/10.3390/biology11111647
Journal volume & issue
Vol. 11, no. 11
p. 1647

Abstract

Read online

Depending on excellent prediction ability, machine learning has been considered the most powerful implement to analyze high-throughput sequencing genome data. However, the sophisticated process of tuning hyperparameters tremendously impedes the wider application of machine learning in animal and plant breeding programs. Therefore, we integrated an automatic tuning hyperparameters algorithm, tree-structured Parzen estimator (TPE), with machine learning to simplify the process of using machine learning for genomic prediction. In this study, we applied TPE to optimize the hyperparameters of Kernel ridge regression (KRR) and support vector regression (SVR). To evaluate the performance of TPE, we compared the prediction accuracy of KRR-TPE and SVR-TPE with the genomic best linear unbiased prediction (GBLUP) and KRR-RS, KRR-Grid, SVR-RS, and SVR-Grid, which tuned the hyperparameters of KRR and SVR by using random search (RS) and grid search (Gird) in a simulation dataset and the real datasets. The results indicated that KRR-TPE achieved the most powerful prediction ability considering all populations and was the most convenient. Especially for the Chinese Simmental beef cattle and Loblolly pine populations, the prediction accuracy of KRR-TPE had an 8.73% and 6.08% average improvement compared with GBLUP, respectively. Our study will greatly promote the application of machine learning in GP and further accelerate breeding progress.

Keywords