پژوهش های علوم دامی (Sep 2021)

Study of Long Range DNA Correlations for Genes Affecting milk yield of dairy cow

  • R. Abadeh, M. Ghaderi-Zefrehei, M. Aminafshar, SA Mohammadi, M. Chamani

DOI
https://doi.org/10.22034/as.2021.30510.1464
Journal volume & issue
Vol. 31, no. 2
pp. 29 – 43

Abstract

Read online

Background and Objective: For mathematically-oriented investigators, DNA is a string. Therefore, they consider a DNA sequence as a string of symbols whose correlation structure can be characterized almost completely by all possible base-base correlation functions at any range, short or long or their corresponding power spectra. Long-range correlations between bases in the DNA sequence are a statistical feature found in the genome of many eukaryotes. The existence of long-range DNA correlations indicates the existence of DNA rearrangement or duplication processes. These types of phenomena are not directly applicable to breeding and are mostly used in evolutionary studies. Our basic assumption in this study was that by extracting long-range DNA correlations between all the different nucleotides within a gene, it is possible to achieve a degree of correlation between them in the first place and possibly better run SNP-based researches. Due to many furious issues, not all investigations of a complete characterization of long-scale correlation structure of DNA sequences were motivated by biology arena. Rather, many such investigations were motivated by the issues of mathematical modeling, cryptography language code detections, dynamical systems, stochastic processes, and noise detections. Perhaps due to this reason, long-scale correlation structure has not yet become part of the toolbox in the “mainstream” DNA sequence analysis in human genetics and breeding settings. Prediction of DNA correlations from a sequence with finite length could be done with, frequency-count estimator, indirect Bayesian estimator, direct Bayesian estimator. Here we followed the ideas by CorGen theory. Materials and methods: 24 genes selected out of genes affecting milk yield of dairy cow. The number, length and length of each exon and its position on the chromosome were obtained from the NCBI gene bank and the sequences were saved in FASTA format. Using software previously designed in #C language, according to the research request, the accession numbers of the studied genes was entered and the appropriate output was obtained. CorGen software was used to calculate the long-range DNA correlations of the genes involved in milk production. Results: The results showed that there is a significant level of long-term correlation in DNA sequence of a number of genes such as EZR, FGG, KRT6A, RAB1A, EIF3L, TBC1D20, ZNF419, S100A16, MRPL3, TPPP3, PHF10. The reduction power of the fitting function of the power function was based on the long-range correlations obtained from genes of different lengths, in the range of 0.146 and 0.643, so it can be concluded that reducing the range of long-range correlations by increasing the interval between DNA sequence intervals does not follow a random process. And so, the fractal geometry of nature is also seen in these genes. This research was an attempt for the first to address long-DNA correlation in dairy cattle genes. There are at least two goals for this job. First, there has been discordant on the result of correlation structure in DNA sequences. Due to this matter of what the actual result is, some researches still believe that DNA sequences do not exhibit any feature long-range DNA correlation which cannot be explained by the basic known stochastic processes such as random sequence or Markov chain - with the first one having no correlation inherently in its theory and the second one considers only short-range correlations. Resolving this disagreement can be straightforward once everybody agrees to use the same measure of correlation, use the same estimator, and apply this estimator of the correlation to the same sequence. The second is to highlight more biologically-motivated study of correlation structure of long range DNA sequences especially in animal breeding. Although this research does not accomplish this task, the intention was to at least put forward the issue. Most of the current studies of correlation (especially the long range one) in DNA sequences are based-base base statistical correlations. This base-base correlation won't not be a powerful tool to reveal the correlation on a global scale or between larger blocks in DNA-sequences. Conclusion: The genes studied have been shown to have high complexity and mode of invariant on their DNA. This type of analysis can be generalized to the work of breeding setting. A more complete characterization of long-range correlation between base pairs at both short and long distances became possible only as long DNA sequences became more commonly available. Now thanks to stupendous growth of DNA generating technologies, almost the entire whole genome of an organism can be sequences in low cost price with high speed time. Therefore, a raw data shall be available for many researchers who are looking for to check new DNA correlation hypotheses in handy DNA sequences. The claim of DNA base-base statistical correlation at long distances in DNA sequences is sought to be still a few steps away from finding a Naive organization principle of the genome. Conclusion: The genes studied have been shown to have high complexity and mode of invariant on their DNA. This type of analysis can be generalized to the work of breeding setting. A more complete characterization of long-range correlation between base pairs at both short and long distances became possible only as long DNA sequences became more commonly available. Now thanks to stupendous growth of DNA generating technologies, almost the entire whole genome of an organism can be sequences in low cost price with high speed time. Therefore, a raw data shall be available for many researchers who are looking for to check new DNA correlation hypotheses in handy DNA sequences. The claim of DNA base-base statistical correlation at long distances in DNA sequences is sought to be still a few steps away from finding a Naive organization principle of the genome.

Keywords