Extending rnaSPAdes functionality for hybrid transcriptome assembly

Andrey D. Prjibelski; Giuseppe D. Puglia; Dmitry Antipov; Elena Bushmanova; Daniela Giordano; Alla Mikheenko; Domenico Vitale; Alla Lapidus

doi:10.1186/s12859-020-03614-2

BMC Bioinformatics (Jul 2020)

Extending rnaSPAdes functionality for hybrid transcriptome assembly

Andrey D. Prjibelski,
Giuseppe D. Puglia,
Dmitry Antipov,
Elena Bushmanova,
Daniela Giordano,
Alla Mikheenko,
Domenico Vitale,
Alla Lapidus

Affiliations

Andrey D. Prjibelski: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University
Giuseppe D. Puglia: Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo
Dmitry Antipov: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University
Elena Bushmanova: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University
Daniela Giordano: Department of Electrical, Electronics and Computer Engineering, University of Catania
Alla Mikheenko: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University
Domenico Vitale: Consiglio Nazionale delle Ricerche, Istituto per i Sistemi Agricoli e Forestali del Mediterraneo
Alla Lapidus: Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, St. Petersburg State University

DOI: https://doi.org/10.1186/s12859-020-03614-2
Journal volume & issue: Vol. 21, no. S12
pp. 1 – 9

Abstract

Read online

Abstract Background De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. Results In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. Conclusion To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords