Вавиловский журнал генетики и селекции (Jan 2016)

Flanking monomer repeats define lower context complexity of sites containing single nucleotide polymorphisms in the human genome

  • N. S. Safronova,
  • M. P. Ponomarenko,
  • I. I. Abnizova,
  • G. V. Orlova,
  • I. V. Chadaeva,
  • Y. L. Orlov

DOI
https://doi.org/10.18699/J15.092
Journal volume & issue
Vol. 19, no. 6
pp. 668 – 674

Abstract

Read online

We have investigated a mutation frequency within the human genome for the set of known single nucleotide polymorphisms (SNPs) from the “1000 genomes” project. We have developed and applied novel statistical computational methods to analyze genetic text based on its complexity. A complexity profiling in a sliding window is applied to the sites containing single nucleotide polymorphisms within the human genome. A local decrease in text complexity level in SNP-containing sites has been shown. Analysis of the complexity profiles for SNPcontaining sites shows that flanking monomer repeats define a lower context complexity of sites containing SNPs within the human genome. An effect of local decrease in text complexity in SNP-containing sites is confirmed by analysis of polymorphisms in the rat and mouse genomes. We have found context differences between coding and regulatory sequences. These differences reflect a complexity of SNP-containing loci. The changes in point mutation frequency were shown previously for microsatellite containing sequences. Using enhanced mathematical tools and larger data sets this work shows enrichment of polytracks and simple sequence repeats in local genome surroundings of SNP containing sites. We have found high-frequency oligonucleotides within genomic regions containing SNPs. Such oligonucleotides are related to nucleotide polytracks. The presence of poly-A tracks might be associated with an increased probability of double helix DNA breaks around mutable loci and following fixation of nucleotide changes. The complexity estimates were computed using a previously developed program tool. This tool allows for both (i) complexity estimation of phased samples, and (ii) rapid and effective identification of the frequency spectrum of oligonucleotides with fixed lengths, and a comparison of oligonucleotide frequencies in different samples

Keywords