Journal of Holistic Integrative Pharmacy (Jun 2024)
Cyclotides prediction in Leptopetalum biflorum based on de novo transcriptome assembly and annotation
Abstract
Objective: There is a scarcity of transcriptome sequencing data available for the Leptopetalum biflorum, and numerous cyclotides remain undiscovered. It is urgent to establish a workflow based on de novo transcriptome assembly and make systematic prediction of cyclotides in Leptopetalum biflorum, to provide a reference for functional analysis of cyclotides. Methods: In this study, we performed RNA-seq on roots, leaves, and flowers of Leptopetalum biflorum to obtain two sets of transcriptome data. The quality assessment of the sequencing was conducted using FastQC and MultiQC. De novo transcriptome assembly of Leptopetalum biflorum was carried out using Trinity, with assembly quality evaluated through the Read Support method and BUSCO tool analysis. The eggnog-mapper and Trinotate were used to annotate functional terms in GO and pathways in KEGG. The Transdecoder was utilized to predict ORFs and coding regions while SignalP software was employed to predict amino acid sequences containing signal peptides and signal peptide splicing sites. The mature protein sequences are subsequently used for cyclotide prediction in Leptopetalum biflorum via FindCRP 2.0 (Find Cyclotide Peptide), a cyclotide prediction tool developed by our team. Results: Trinity assembled a total of 171,310 transcripts and 103,299 isoforms (genes). The average transcript length was 1139.89, while the average gene length was 780.87. Approximately 30% of the genes exhibited homology within other plant species. Among these genes, 23,265 (22.52%) were annotated into 41 GO terms at Level 2. The KEGG pathway annotation revealed that 23,682 genes (22.92%) contained 5171 KO annotations and were involved in 484 pathways. FindCRP predicted 17 potential cyclotides, among which 15 sequences had homologous genes; notably five potential cyclotides showed complete identity (100%) to their respective homologous genes. Additionally, two potential cyclotide sequences without any identified homologous demonstrated circle-forming ability based on the 3D structure prediction results. Conclusion: In this study, we developed a de novo transcriptome assembly workflow for the identification of cyclotides using RNA-seq data from Leptopetalum biflorum. Our custom-built tool, FindCRP, was employed in this workflow to detect potential cyclotides. This meticulously designed workflow ensures the reproducibility and reliability of our study findings. We successfully performed transcript annotation and predicted putative cyclotides. These potential cyclotides show significant homology to known cyclotides.