Genome Biology (Mar 2018)

Full-length mRNA sequencing uncovers a widespread coupling between transcription initiation and mRNA processing

  • Seyed Yahya Anvar,
  • Guy Allard,
  • Elizabeth Tseng,
  • Gloria M. Sheynkman,
  • Eleonora de Klerk,
  • Martijn Vermaat,
  • Raymund H. Yin,
  • Hans E. Johansson,
  • Yavuz Ariyurek,
  • Johan T. den Dunnen,
  • Stephen W. Turner,
  • Peter A. C. ‘t Hoen

DOI
https://doi.org/10.1186/s13059-018-1418-0
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. Results In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. Conclusions Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.