BMC Genomic Data (Jun 2022)
Bio-informatic analysis of CRISPR protospacer adjacent motifs (PAMs) in T4 genome
Abstract
Abstract Background The existence of protospacer adjacent motifs (PAMs) sequences in bacteriophage genome is critical for the recognition and function of the clustered regularly interspaced short palindromic repeats-Cas (CRISPR-Cas) machinery system. We further elucidate the significance of PAMs and their function, particularly as a part of transcriptional regulatory regions in T4 bacteriophages. Methods A scripting language was used to analyze a sequence of T4 phage genome, and a list of few selected PAMs. Mann-Whitney Wilcoxon (MWW) test was used to compare the sequence hits for the PAMs versus the hits of all the possible sequences of equal lengths. Results The results of MWW test show that certain PAMs such as: ‘NGG’ and ‘TATA’ are preferably located at the core of phage promoters: around -10 position, whereas the position around -35 appears to have no detectable count variation of any of the tested PAMs. Among all tested PAMs, the following three sequences: 5’-GCTV-3’, 5’-TTGAAT-3’ and 5’-TTGGGT-3’ have higher prevalence in essential genes. By analyzing all the possible ways of reading PAM sequences as codons for the corresponding amino acids, it was found that deduced amino acids of some PAMs have a significant tendency to prefer the surface of proteins. Conclusion These results provide novel insights into the location and the subsequent identification of the role of PAMs as transcriptional regulatory elements. Also, CRISPR targeting certain PAM sequences is somehow likely to be connected to the hydrophilicity (water solubility) of amino acids translated from PAM’s triplets. Therefore, these amino acids are found at the interacting unit at protein-protein interfaces.
Keywords