BMC Bioinformatics (Sep 2024)
SmithHunter: a workflow for the identification of candidate smithRNAs and their targets
Abstract
Abstract Background SmithRNAs (Small MITochondrial Highly-transcribed RNAs) are a novel class of small RNA molecules that are encoded in the mitochondrial genome and regulate the expression of nuclear transcripts. Initial evidence for their existence came from the Manila clam Ruditapes philippinarum, where they have been described and whose activity has been biologically validated through RNA injection experiments. Current evidence on the existence of these RNAs in other species is based only on small RNA sequencing. As a preliminary step to characterize smithRNAs across different metazoan lineages, a dedicated, unified, analytical workflow is needed. Results We propose a novel workflow specifically designed for smithRNAs. Sequence data (from small RNA sequencing) uniquely mapping to the mitochondrial genome are clustered into putative smithRNAs and prefiltered based on their abundance, presence in replicate libraries and 5′ and 3′ transcription boundary conservation. The surviving sequences are subsequently compared to the untranslated regions of nuclear transcripts based on seed pairing, overall match and thermodynamic stability to identify possible targets. Ample collateral information and graphics are produced to help characterize these molecules in the species of choice and guide the operator through the analysis. The workflow was tested on the original Manila clam data. Under basic settings, the results of the original study are largely replicated. The effect of additional parameter customization (clustering threshold, stringency, minimum number of replicates, seed matching) was further evaluated. Conclusions The study of smithRNAs is still in its infancy and no dedicated analytical workflow is currently available. At its core, the SmithHunter workflow builds over the bioinformatic procedure originally applied to identify candidate smithRNAs in the Manila clam. In fact, this is currently the only evidence for smithRNAs that has been biologically validated and, therefore, the elective starting point for characterizing smithRNAs in other species. The original analysis was readapted using current software implementations and some minor issues were solved. Moreover, the workflow was improved by allowing the customization of different analytical parameters, mostly focusing on stringency and the possibility of accounting for a minimal level of genetic differentiation among samples.
Keywords