پژوهش های علوم دامی (Mar 2022)
Evaluation of selected methods related to Genome-Wide Association Studies for identification of gene locus
Abstract
Introduction: Due to the widespread distribution of Single Nucleotide Polymorphisms (SNPs) throughout the genome, these markers are widely used in livestock breeding research. These markers have been used to predict the disease risk in human, to localize genetic variations responsible for complex traits through genome wide association study (GWAS), and to predict the genetic values of economically important traits in plant and animal breeding (Zhang et al. 2015). Mostly, whole genome scanning methods are based on two methods: Single SNP Genome-Wide Association Studies (SSGWAS) and multiple markers methods. The SSGWAS method is able to identify a large number of common variables affecting quantitative traits. However, a large proportion of the genetic variance remains to be explained (Shirali et al. 2018). In quantitative traits the proportion of phenotypic variance explained by SNPs is related to the number of adjacent SNPs in the genomic region. The heritability created by these genomic regions is defined as the regional heritability. The Regional Heritability Mapping (RHM) method is used to identify small genomic regions. This method can capture more of the missing genetic variation (Nagamine et al. 2012). In RHM, a mixed model framework based on Restricted Maximum Likelihood (REML) is used, and two variance components, one contributed by the whole genome and a second one by a specific genomic region, are fitted in the model to estimate genomic and regional heritabilities, respectively (Uemoto et al. 2013). Also fast and flexible set-Based Association Test (fastBAT) is a method that performs a fast set-based association analysis (Bakshi et al. 2016). The purpose of this study is compare SNPs and regions identified by the Genome-Wide Association methods, compare these results with the simulated Quantitative Trait Locus (QTL) and also investigate and determine the false positive results in each method. Material and methods: In this study, markers and populations were simulated as a Forward-in-time process using QMSim software (Sargolzaei and Schenkel 2009). For this population, 27586 SNPs were counted on 3 pairs of autosomal chromosomes. Simulation was performed in 3 scenarios with 75, 150 and 300 QTL. The minimum and maximum number of SNPs in the analysis after quality control were 19662 and 23817 SNPs, respectively. For each scenario, 10 replicates were simulated, in all scenarios, heritability was 0.2 which corresponded equally to the polygenic and QTLs effects. Whole genomic relationship and pedigree base genetic relationship matrices were used in all 3 methods to estimate genetic parameters. To create the whole genomic relationships matrix, whole genomic additive effects was estimated using all SNPs. Also the additive effect of genomic regions was estimated using the regional genomic relationship matrix. Whole genomic relationships matrix and regional genomic relationship matrix were estimated based on genetic relationships between individuals using SNPs by GCTA software (Yang et al 2011). Pedigree based genetic relationship matrix was created by the kinship relationship between individuals using pedigree package (Coster 2013) of RStudio software (RStudio Inc 2013). In this study, we considered windows containing 50 genotyped SNPs to perform RHM and to estimate variance components. Additionally, we used windows containing 25 genotyped SNPs to overlap between two consecutive windows throughout the genome. SSGWAS analysis were performed by MLMA (Yu et al. 2006) method using GCTA software. MLMA results were adjusted based on P-value at 5% significant threshold using Bonferroni correction. We used GCTA software to evaluate the results of SSGWAS using fastBAT method. Results and discussion: For each replication after identifying significant SNPs, the genetic variance explained by these SNPs was estimated by equation (Faulkner & McKay 1996). In Table 1, the number of QTLs detected by the SSGWAS method, the MAF of QTLs, the range and mean of genetic variance explained by significant SNPs and QTLs are reported. For 30 replicates of simulation in SSGWAS, 16 QTLs were detected containing 2 QTLs with MAF≤0.1 and other detected QTLs with MAF≥0.1. Hundred seven significant regions were identified in fastBAT method. In this method, 120 QTLs were detected in 3 scenarios containing 52 QTLs with MAF≤0.1. All QTLs detected in the fastBAT and SSGWAS methods were also detected in the RHM method. In RHM method, 612 regions containing simulated QTLs and number of 316 QTLs with MAF≤0.1 were detected. In all replications, the variance explained by SNPs was equal to the variance explained by QTLs. In SSGWAS, less number of QTLs were detected than the other two methods and the maximum variance explained by QTLs was 14.9%. The criterion used to determine false positive QTLs was the absence of significant QTL in before and after significant windows containing QTLs. In SSGWAS method the percentage of false positive QTLs was higher than the other two methods. In fastBAT, unlike the other two methods, detected QTLs were not false positive. Number of detected QTLs, MAF range of QTLs, range and mean of genetic variance explained by detected QTLs and SNPs in fastBAT are shown in table 5. Many QTLs and regions detected by RHM method were not detected by SSGWAS and fastBAT methods. The genetic variance explained by detected QTLs in the RHM was at the range of 7.26% to 46.86% being higher than other two methods. In table 6, we have compared three methods by the number of detected QTLs, number of false positive QTLs, number of stable QTLs and the number of detected QTLs with MAF≤0.1. Correspondingly, we found that QTLs with MAF≤0.1 were more frequently detected in RHM than the other two methods. Conclusion: In this study, we found that the potential of RHM method for identifying QTLs affecting the trait variance was higher than SSGWAS and fastBAT methods.
Keywords