A Two-Stage Mutual Information Based Bayesian Lasso Algorithm for Multi-Locus Genome-Wide Association Studies
Hongping Guo,
Zuguo Yu,
Jiyuan An,
Guosheng Han,
Yuanlin Ma,
Runbin Tang
Affiliations
Hongping Guo
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Zuguo Yu
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Jiyuan An
Centre for Tropical Crops and Biocommodities, Queensland University of Technology, Brisbane, QLD 4001, Australia
Guosheng Han
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Yuanlin Ma
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Runbin Tang
Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan 411105, China
Genome-wide association study (GWAS) has turned out to be an essential technology for exploring the genetic mechanism of complex traits. To reduce the complexity of computation, it is well accepted to remove unrelated single nucleotide polymorphisms (SNPs) before GWAS, e.g., by using iterative sure independence screening expectation-maximization Bayesian Lasso (ISIS EM-BLASSO) method. In this work, a modified version of ISIS EM-BLASSO is proposed, which reduces the number of SNPs by a screening methodology based on Pearson correlation and mutual information, then estimates the effects via EM-Bayesian Lasso (EM-BLASSO), and finally detects the true quantitative trait nucleotides (QTNs) through likelihood ratio test. We call our method a two-stage mutual information based Bayesian Lasso (MBLASSO). Under three simulation scenarios, MBLASSO improves the statistical power and retains the higher effect estimation accuracy when comparing with three other algorithms. Moreover, MBLASSO performs best on model fitting, the accuracy of detected associations is the highest, and 21 genes can only be detected by MBLASSO in Arabidopsis thaliana datasets.