MethodsX (Jan 2022)
New complementary python codes to locate Single Nucleotide Polymorphisms (SNPs) and Overlapping G-Quadruplex Sequences (G4s)
Abstract
G-quadruplexes (G4s) are non-canonical DNA and RNA secondary structures that control gene regulation. A single nucleotide polymorphism (SNP) is a small genetic variation occurring within a DNA sequence and accounting for the variabilities between individuals. While the majority of SNPs, especially those frequent in the population, are considered as benign genetic variations, few others can lead to diseases. SNPs occurring in G4 sequences were reported to modulate gene regulation. In order to find overlaps between predicted G4 sequences and SNPs located in the genomic regions, we developed two complementary computational python codes (SNP-locator and G4-overlap). The codes map a mutation to the overlapping/closest G4 sequences, based on the genetic variant name and the FASTA format of the corresponding gene. We validated these two codes on a set of 31 SNP variants occurring in cytochromes P450 genes and podocytes-marker genes. Out of 31 SNPs, 28 were accurately located using the mentioned codes. • SNP-locator code locates any SNP in promoters, upstream regulatory regions, exons and introns. • The SNP-locator code requires the FASTA genomic sequence of the studied gene and the genetic variant nomenclature at the cDNA level. • G4-overlap code maps the SNP to the overlapping or the closest G4 sequence.