Вавиловский журнал генетики и селекции (Jan 2016)
The use of graphics accelerators to detect functional signals in the regulatory regions of prokaryotic genes
Abstract
Various methods for identification of significant contextual signals are widely used to search for transcription factor binding sites and to identify the structural and functional organization of regulatory regions. These methods do not require any pre-alignment of the sample sequences analyzed or experimental information about the exact location of transcription factor binding sites. Methods of searching for contextual signals, based on the identification of degenerate oligonucleotide motives recorded in the 15-letter IUPAC code have become widespread. An essential problem with degenerate motifs is their great diversity, which makes the researchers apply heuristics which do not guarantee that the most significant signal will be found. The development of high-performance computing systems based on the use of graphics cards has made it possible to use the exact exhaustive methods to identify significant motifs. We have developed a new system for identifying significant degenerate oligonucleotide motifs of a given length in the regulatory regions based on the use of widespread graphics cards that provides a search for the signal with the greatest significance. High efficiency of the GPU compared with CPU was demonstrated. Using the proposed approach, we analyzed the regulatory regions of B. subtilis, E. coli, H. pylori, M. gallisepticum, M. genitalium and M. pneumoniae genes. Sets of degenerate motifs have been identified for each species of prokaryotes. They were classified on the basis of similarity with the transcription factor binding sites of E. coli.
Keywords