lncEvo: automated identification and conservation study of long noncoding RNAs

Oleksii Bryzghalov; Izabela Makałowska; Michał Wojciech Szcześniak

doi:10.1186/s12859-021-03991-2

BMC Bioinformatics (Feb 2021)

lncEvo: automated identification and conservation study of long noncoding RNAs

Oleksii Bryzghalov,
Izabela Makałowska,
Michał Wojciech Szcześniak

Affiliations

Oleksii Bryzghalov: Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan
Izabela Makałowska: Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan
Michał Wojciech Szcześniak: Institute of Human Biology and Evolution, Faculty of Biology, Adam Mickiewicz University in Poznan

DOI: https://doi.org/10.1186/s12859-021-03991-2
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. Results To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study—a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. Conclusions lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf .

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords