PLoS ONE (Apr 2011)
An integrated approach for finding overlooked genes in Shigella.
Abstract
BackgroundThe completion of numerous genome sequences introduced an era of whole-genome study. However, many genes are missed during genome annotation, including small RNAs (sRNAs) and small open reading frames (sORFs). In order to improve genome annotation, we aimed to identify novel sRNAs and sORFs in Shigella, the principal etiologic agents of bacillary dysentery.Methodology/principal findingsWe identified 64 sRNAs in Shigella, which were experimentally validated in other bacteria based on sequence conservation. We employed computer-based and tiling array-based methods to search for sRNAs, followed by RT-PCR and northern blots, to identify nine sRNAs in Shigella flexneri strain 301 (Sf301) and 256 regions containing possible sRNA genes. We found 29 candidate sORFs using bioinformatic prediction, array hybridization and RT-PCR verification. We experimentally validated 557 (57.9%) DOOR operon predictions in the chromosomes of Sf301 and 46 (76.7%) in virulence plasmid.We found 40 additional co-expressed gene pairs that were not predicted by DOOR.Conclusions/significanceWe provide an updated and comprehensive annotation of the Shigella genome. Our study increased the expected numbers of sORFs and sRNAs, which will impact on future functional genomics and proteomics studies. Our method can be used for large scale reannotation of sRNAs and sORFs in any microbe with a known genome sequence.