International Journal of Molecular Sciences (Aug 2023)

Classification of Promoter Sequences from Human Genome

  • Konstantin Zaytsev,
  • Alexey Fedorov,
  • Eugene Korotkov

DOI
https://doi.org/10.3390/ijms241612561
Journal volume & issue
Vol. 24, no. 16
p. 12561

Abstract

Read online

We have developed a new method for promoter sequence classification based on a genetic algorithm and the MAHDS sequence alignment method. We have created four classes of human promoters, combining 17,310 sequences out of the 29,598 present in the EPD database. We searched the human genome for potential promoter sequences (PPSs) using dynamic programming and position weight matrices representing each of the promoter sequence classes. A total of 3,065,317 potential promoter sequences were found. Only 1,241,206 of them were located in unannotated parts of the human genome. Every other PPS found intersected with either true promoters, transposable elements, or interspersed repeats. We found a strong intersection between PPSs and Alu elements as well as transcript start sites. The number of false positive PPSs is estimated to be 3 × 10−8 per nucleotide, which is several orders of magnitude lower than for any other promoter prediction method. The developed method can be used to search for PPSs in various eukaryotic genomes.

Keywords