Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis

Peng Guo; Bo Zhu; Hong Niu; Zezhao Wang; Yonghu Liang; Yan Chen; Lupei Zhang; Hemin Ni; Yong Guo; El Hamidi A. Hay; Xue Gao; Huijiang Gao; Xiaolin Wu; Lingyang Xu; Junya Li

doi:10.1186/s12859-017-2003-3

BMC Bioinformatics (Jan 2018)

Fast genomic prediction of breeding values using parallel Markov chain Monte Carlo with convergence diagnosis

Peng Guo,
Bo Zhu,
Hong Niu,
Zezhao Wang,
Yonghu Liang,
Yan Chen,
Lupei Zhang,
Hemin Ni,
Yong Guo,
El Hamidi A. Hay,
Xue Gao,
Huijiang Gao,
Xiaolin Wu,
Lingyang Xu,
Junya Li

Affiliations

Peng Guo: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Bo Zhu: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Hong Niu: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Zezhao Wang: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Yonghu Liang: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Yan Chen: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Lupei Zhang: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Hemin Ni: Animal Science and Technology College, Beijing University of Agriculture
Yong Guo: Animal Science and Technology College, Beijing University of Agriculture
El Hamidi A. Hay: Livestock and Range Research Laboratory, ARS, USDA
Xue Gao: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Huijiang Gao: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Xiaolin Wu: Biostatistics and Bioinformatics, GeneSeek (A Neogen company)
Lingyang Xu: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences
Junya Li: Laboratory of Molecular Biology and Bovine Breeding, Institute of Animal Science, Chinese Academy of Agricultural Sciences

DOI: https://doi.org/10.1186/s12859-017-2003-3
Journal volume & issue: Vol. 19, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Background Running multiple-chain Markov Chain Monte Carlo (MCMC) provides an efficient parallel computing method for complex Bayesian models, although the efficiency of the approach critically depends on the length of the non-parallelizable burn-in period, for which all simulated data are discarded. In practice, this burn-in period is set arbitrarily and often leads to the performance of far more iterations than required. In addition, the accuracy of genomic predictions does not improve after the MCMC reaches equilibrium. Results Automatic tuning of the burn-in length for running multiple-chain MCMC was proposed in the context of genomic predictions using BayesA and BayesCπ models. The performance of parallel computing versus sequential computing and tunable burn-in MCMC versus fixed burn-in MCMC was assessed using simulation data sets as well by applying these methods to genomic predictions of a Chinese Simmental beef cattle population. The results showed that tunable burn-in parallel MCMC had greater speedups than fixed burn-in parallel MCMC, and both had greater speedups relative to sequential (single-chain) MCMC. Nevertheless, genomic estimated breeding values (GEBVs) and genomic prediction accuracies were highly comparable between the various computing approaches. When applied to the genomic predictions of four quantitative traits in a Chinese Simmental population of 1217 beef cattle genotyped by an Illumina Bovine 770 K SNP BeadChip, tunable burn-in multiple-chain BayesCπ (TBM-BayesCπ) outperformed tunable burn-in multiple-chain BayesCπ (TBM-BayesA) and Genomic Best Linear Unbiased Prediction (GBLUP) in terms of the prediction accuracy, although the differences were not necessarily caused by computational factors and could have been intrinsic to the statistical models per se. Conclusions Automatically tunable burn-in multiple-chain MCMC provides an accurate and cost-effective tool for high-performance computing of Bayesian genomic prediction models, and this algorithm is generally applicable to high-performance computing of any complex Bayesian statistical model.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords