Genomics, Proteomics & Bioinformatics (Oct 2019)

CircAST: Full-length Assembly and Quantification of Alternatively Spliced Isoforms in Circular RNAs

  • Jing Wu,
  • Yan Li,
  • Cheng Wang,
  • Yiqiang Cui,
  • Tianyi Xu,
  • Chang Wang,
  • Xiao Wang,
  • Jiahao Sha,
  • Bin Jiang,
  • Kai Wang,
  • Zhibin Hu,
  • Xuejiang Guo,
  • Xiaofeng Song

Journal volume & issue
Vol. 17, no. 5
pp. 522 – 534

Abstract

Read online

Circular RNAs (circRNAs), covalently closed continuous RNA loops, are generated from cognate linear RNAs through back splicing events, and alternative splicing events may generate different circRNA isoforms at the same locus. However, the challenges of reconstruction and quantification of alternatively spliced full-length circRNAs remain unresolved. On the basis of the internal structural characteristics of circRNAs, we developed CircAST, a tool to assemble alternatively spliced circRNA transcripts and estimate their expression by using multiple splice graphs. Simulation studies showed that CircAST correctly assembled the full sequences of circRNAs with a sensitivity of 85.63%–94.32% and a precision of 81.96%–87.55%. By assigning reads to specific isoforms, CircAST quantified the expression of circRNA isoforms with correlation coefficients of 0.85–0.99 between theoretical and estimated values. We evaluated CircAST on an in-house mouse testis RNA-seq dataset with RNase R treatment for enriching circRNAs and identified 380 circRNAs with full-length sequences different from those of their corresponding cognate linear RNAs. RT-PCR and Sanger sequencing analyses validated 32 out of 37 randomly selected isoforms, thus further indicating the good performance of CircAST, especially for isoforms with low abundance. We also applied CircAST to published experimental data and observed substantial diversity in circular transcripts across samples, thus suggesting that circRNA expression is highly regulated. CircAST can be accessed freely at https://github.com/xiaofengsong/CircAST. Keywords: Circular RNA, Full-length reconstruction, Isoform quantification, Multiple splice graph model, Transcriptome