پژوهش های علوم دامی (Mar 2022)

Cluster analysis of milk fat yield trait in dairy cows using meta-analysis of the genome-wide association studies

  • S Bakhshalizadeh1, S Zerehdaran2* and A Javadmanesh3

DOI
https://doi.org/10.22034/AS.2022.37887.1545
Journal volume & issue
Vol. 31, no. 4
pp. 29 – 42

Abstract

Read online

Introduction: Cow's milk is one of the most important animal products for any country and known as an important source for fat and protein originated from animals (Fenelon and Guinee, 1999, Martini et al., 2016). Milk fat has a high impact on taste of milk and other dairy products (Spelman et al., 1996). The liver in ruminant animals including dairy cows plays an important role in the metabolism of carbohydrates, fats, vitamins, hormones, and etc. The absorbed nutrients pass through the liver from the gastrointestinal tract and enter the blood circulation system, and eventually enter the mammary glands of dairy cows. Therefore, the liver plays an essential role in cow lactation (Graber et al., 2010 and Schlegel et al. 2012). All of the components that determine milk quality can be considered as quantitative traits that are controlled by many genes and are influenced by environmental factors. If genetic markers can explain a significant part of the variation, they can be considered as ideal candidates for genomic selection (Shi etal. 2019). Previously, microsatellite markers were frequently used to identify quantitative trait locus (QTL). With the progress of science, the advent of the single-nucleotide polymorphisms (SNP) are used in genome-wide association studies (GWAS) to identify QTL. In dairy cows, some of the major genes with significant effects on milk fat have been identified in previous GWAS studies. Therefore, with the existence of a large number of GWAS in dairy cows, these studies can be combined using meta-analysis to achieve higher power results. These studies contribute to our current understanding of the genetic regulation of milk fat yield traits. This approach provides a better understanding of the genetic architecture of complex traits. The network clustering algorithm and cluster identification are important tools in the structural analysis of networks. Various types of clustering algorithms are used for protein-protein interactions (PPIs) networks analysis. In this study, we used an algorithm known as MCODE to identify dense regions in the PPIs diagram (Bader and Hogue, 2003). The overall purpose of PPIs network clustering is grouping of genes or proteins that according to various scales are related to each other. The network of PPIs contains different proteins important in different pathways. These genes or proteins are clustered based on the similarity of metric and are known as matrix distances. It is also important to predict molecular assemblies of protein interaction data because it provides another level of functional annotation (Gollapalli et al., 2015). The purpose of this study was to conduct a meta-analysis of GWAS in cluster analysis to identify genes that are effective in milk fat yield in dairy cows. Material and methods: In this study, the data used were GWAS summary data. All data were collected from 19 published studies from 2010 to 2019. This research included main papers and dissertation (valid dissertations with published papers). All available genes were combined, synthesized, and evaluated using a meta-analysis method. The Cytoscape v3.7.2 software was used to analyze and visualize the genes examined by the STRING v1.5.0 plugin and to extract clusters from the MCODE v1.5.1 algorithm. Therefore, the results of the GWAS summary data were combined in molecular networks with PPIs, which have a significant role in increasing the association studies power to identify genes affecting milk fat trait. Also, the DAVID server was used to identify the gene ontology (GO) term enrichment in order to detect enriched biological terms associated with genomic regions and to identify gene networks using functional annotation clustering tools based on enriched pathways analysis. Results and discussion: In this study, we analyzed 223 genes using the STRING plugin in Cytoscape software. These genes were associated with at least another gene and had a direct and partial correlation with each other. In the gene network, the correlation created for milk fat yield trait included 213 genes or nodes and 219 edges (gene connection). The P-value calculated in the STRING network was statistically significant for enriched pathways in PPIs ( ). The collection of important and popular genes were evaluated using the MCODE plugin. Seven clusters were identified and grouped in this network. For instance, proteins in cluster 1 included: ARHGAP39, CPSF1, CYHR1, PPP1R16A, GRINA, MROH1, and SMPD5 genes. As shown in Table 2, cluster 1 (score=7) was connected with 7 node density to 21 nodes. This cluster showed proteins that play important roles in the internal space of the endoplasmic reticulum (cellular components), metal ion binding (molecular function), and integral to the membrane (cellular components). CPSF1, CYHR1, and GRINA were the major genes involved in the internal space of endoplasmic reticulum, metal ion binding, and membrane integral, respectively. It was found that clusters 1 and 2 have the highest score between all reported clusters. Conclusion: These results show that using data from different sources can increase the reliability and accuracy of GWAS studies. We were able to identify the most important genes in the network pathways using the GWAS summary data in cluster analysis. This method determines the quality of proteins involved in fat yield while facilitating our understanding of the molecular structures of proteins. The most important genes with high scores were identified in cluster one (ARHGAP39, CPSF1, CYHR1, PPP1R16A, GRINA, MROH1, and SMPD5) and two (HERC1, UBR4, ASB17, TRIM9, KLHL2, and BTRC). These clusters based on existing biological knowledge can help data mining and system models understand network interactions and pathways. These protein clusters provide a deep insight into how genes interact with each other in network analysis for fat yield. Moreover, it was observed that meta-analysis of GWAS summary data can play an important role in the wide understanding of network visualization and cluster analysis of identified genes in enriched pathways. Therefore, cluster analysis can improve the identified genes power for economically important traits such as milk fat yield in a population of dairy cows and can be used in future genomic evaluations and breeding programs.

Keywords