PLoS Computational Biology (Jun 2022)

Improved transcriptome assembly using a hybrid of long and short reads with StringTie.

  • Alaina Shumate,
  • Brandon Wong,
  • Geo Pertea,
  • Mihaela Pertea

DOI
https://doi.org/10.1371/journal.pcbi.1009730
Journal volume & issue
Vol. 18, no. 6
p. e1009730

Abstract

Read online

Short-read RNA sequencing and long-read RNA sequencing each have their strengths and weaknesses for transcriptome assembly. While short reads are highly accurate, they are rarely able to span multiple exons. Long-read technology can capture full-length transcripts, but its relatively high error rate often leads to mis-identified splice sites. Here we present a new release of StringTie that performs hybrid-read assembly. By taking advantage of the strengths of both long and short reads, hybrid-read assembly with StringTie is more accurate than long-read only or short-read only assembly, and on some datasets it can more than double the number of correctly assembled transcripts, while obtaining substantially higher precision than the long-read data assembly alone. Here we demonstrate the improved accuracy on simulated data and real data from Arabidopsis thaliana, Mus musculus, and human. We also show that hybrid-read assembly is more accurate than correcting long reads prior to assembly while also being substantially faster. StringTie is freely available as open source software at https://github.com/gpertea/stringtie.