Frontiers in Genetics (Mar 2016)

Global Intersection of Long Non-Coding RNAs with Processed and Unprocessed Pseudogenes in the Human Genome

  • Michael John Milligan,
  • Erin eHarvey,
  • Albert eYu,
  • Ashleigh eMorgan,
  • Caio eDamski,
  • Daniela-Lee eSmith,
  • Eden eZhang,
  • Jothini eSivananthan,
  • Jonathan eBerengut,
  • Radhini eSubramaniam,
  • Aleksandra eSkoric,
  • Scott eCollins,
  • Kevin V Morris,
  • Kevin V Morris,
  • Leonard eLipovich

DOI
https://doi.org/10.3389/fgene.2016.00026
Journal volume & issue
Vol. 7

Abstract

Read online

Pseudogenes are abundant in the human genome and had long been thought of purely as nonfunctional gene fossils. Recent observations point to a role for pseudogenes in regulating genes transcriptionally and post-transcriptionally in human cells. To computationally interrogate the network space of integrated pseudogene and long non-coding RNA regulation in the human transcriptome, we developed and implemented an algorithm to identify all long non-coding RNA (lncRNA) transcripts that overlap the genomic spans, and specifically the exons, of any human pseudogenes in either sense or antisense orientation. As inputs to our algorithm, we imported three public repositories of pseudogenes: GENCODE v17 (processed and unprocessed, Ensembl 72); Retroposed Pseudogenes V5 (processed only) and Yale Pseudo60 (processed and unprocessed, Ensembl 60); two public lncRNA catalogs: Broad Institute, GENCODE v17; NCBI annotated piRNAs; and NHGRI clinical variants. The data sets were retrieved from the UCSC Genome Database using the UCSC Table Browser. We identified 2277 loci containing exon-to-exon overlaps between pseudogenes, both processed and unprocessed, and long non-coding RNA genes. Of these loci we identified 1167 with Genbank EST and full-length cDNA support providing direct evidence of transcription on one or both strands with exon-to-exon overlaps. The analysis converged on 313 pseudogene-lncRNA exon-to-exon overlaps that were bidirectionally supported by both full-length cDNAs and ESTs. In the process of identifying transcribed pseudogenes, we generated a comprehensive, positionally non-redundant encyclopedia of human pseudogenes, drawing upon multiple, and formerly disparate public pseudogene repositories. Collectively, these observations suggest that pseudogenes are pervasively transcribed on both strands and are common drivers of gene regulation.

Keywords