Genome Biology (Jan 2024)

GTax: improving de novo transcriptome assembly by removing foreign RNA contamination

  • Roberto Vera Alvarez,
  • David Landsman

DOI
https://doi.org/10.1186/s13059-023-03141-2
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 21

Abstract

Read online

Abstract The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.