BMC Genomics (Mar 2007)
Identification of plant promoter constituents by analysis of local distribution of short sequences
Abstract
Abstract Background Plant promoter architecture is important for understanding regulation and evolution of the promoters, but our current knowledge about plant promoter structure, especially with respect to the core promoter, is insufficient. Several promoter elements including TATA box, and several types of transcriptional regulatory elements have been found to show local distribution within promoters, and this feature has been successfully utilized for extraction of promoter constituents from human genome. Results LDSS (Local Distribution of Short Sequences) profiles of short sequences along the plant promoter have been analyzed in silico, and hundreds of hexamer and octamer sequences have been identified as having localized distributions within promoters of Arabidopsis thaliana and rice. Based on their localization patterns, the identified sequences could be classified into three groups, pyrimidine patch (Y Patch), TATA box, and REG (Regulatory Element Group). Sequences of the TATA box group are consistent with the ones reported in previous studies. The REG group includes more than 200 sequences, and half of them correspond to known cis-elements. The other REG subgroups, together with about a hundred uncategorized sequences, are suggested to be novel cis-regulatory elements. Comparison of LDSS-positive sequences between Arabidopsis and rice has revealed moderate conservation of elements and common promoter architecture. In addition, a dimer motif named the YR Rule (C/T A/G) has been identified at the transcription start site (-1/+1). This rule also fits both Arabidopsis and rice promoters. Conclusion LDSS was successfully applied to plant genomes and hundreds of putative promoter elements have been extracted as LDSS-positive octamers. Identified promoter architecture of monocot and dicot are well conserved, but there are moderate variations in the utilized sequences.