Journal of Integrative Bioinformatics (Dec 2004)

Searching for ncRNAs in eukaryotic genomes: Maximizing biological input with RNAmotif

  • Collins Lesley J.,
  • Macke Thomas J.,
  • Penny David

DOI
https://doi.org/10.1515/jib-2004-6
Journal volume & issue
Vol. 1, no. 1
pp. 64 – 79

Abstract

Read online

Non-coding RNAs (ncRNAs) contain both characteristic secondary-structure and short sequence motifs. However, “complex” ncRNAs (RNA bound to proteins in ribonucleoprotein complexes) can be hard to identify in genomic sequence data. Programs able to search for ncRNAs were previously limited to ncRNA molecules that either align very well or have highly conserved secondary-structure. The RNAmotif program uses additional information to find ncRNA gene candidates through the design of an appropriate “descriptor” to model sequence motifs, secondary-structure and protein/RNA binding information. This enables searches of those ncRNAs that contain variable secondary-structure and limited sequence motif information. Applying the biologically-based concept of “positive and negative controls” to the RNAmotif search technique, we can now go beyond the testing phase to successfully search real genomes, complete with their background noise and related molecules. Descriptors are designed for two “complex” ncRNAs, the U5snRNA (from the spliceosome) and RNaseP RNA, which successfully uncover these sequences from some eukaryotic genomes. We include explanations about the construction of the input “descriptors” from known biological information, to allow searches for other ncRNAs. RNAmotif maximizes the input of biological knowledge into a search for an ncRNA gene and now allows the investigation of some of the hardest-to-find, yet important, genes in some very interesting eukaryotic organisms.