BMC Bioinformatics (Jul 2020)

SLFinder, a pipeline for the novel identification of splice-leader sequences: a good enough solution for a complex problem

  • Javier Calvelo,
  • Hernán Juan,
  • Héctor Musto,
  • Uriel Koziol,
  • Andrés Iriarte

DOI
https://doi.org/10.1186/s12859-020-03610-6
Journal volume & issue
Vol. 21, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background Spliced Leader trans-splicing is an important mechanism for the maturation of mRNAs in several lineages of eukaryotes, including several groups of parasites of great medical and economic importance. Nevertheless, its study across the tree of life is severely hindered by the problem of identifying the SL sequences that are being trans-spliced. Results In this paper we present SLFinder, a four-step pipeline meant to identify de novo candidate SL sequences making very few assumptions regarding the SL sequence properties. The pipeline takes transcriptomic de novo assemblies and a reference genome as input and allows the user intervention on several points to account for unexpected features of the dataset. The strategy and its implementation were tested on real RNAseq data from species with and without SL Trans-Splicing. Conclusions SLFinder is capable to identify SL candidates with good precision in a reasonable amount of time. It is especially suitable for species with unknown SL sequences, generating candidate sequences for further refining and experimental validation.

Keywords