Bioinformatics and Biology Insights (Jan 2008)

Estimating the Fraction of Non-Coding RNAs in Mammalian Transcriptomes

  • Tamar Schlick,
  • Hin Hark Gan,
  • Giulio Quarta,
  • Yurong Xin

Journal volume & issue
Vol. 2
pp. 77 – 95

Abstract

Read online

Recent studies of mammalian transcriptomes have identified numerous RNA transcripts that do not code for proteins; their identity, however, is largely unknown. Here we explore an approach based on sequence randomness patterns to discern different RNA classes. The relative z-score we use helps identify the known ncRNA class from the genome, intergene and intron classes. This leads us to a fractional ncRNA measure of putative ncRNA datasets which we model as a mixture of genuine ncRNAs and other transcripts derived from genomic, intergenic and intronic sequences. We use this model to analyze six representative datasets identified by the FANTOM3 project and two computational approaches based on comparative analysis (RNAz and EvoFold). Our analysis suggests fewer ncRNAs than estimated by DNA sequencing and comparative analysis, but the verity of our approach and its prediction requires more extensive experimental RNA data.

Keywords