Biology (Aug 2022)

Making the Most of Its Short Reads: A Bioinformatics Workflow for Analysing the Short-Read-Only Data of <i>Leishmania orientalis</i> (Formerly Named <i>Leishmania siamensis</i>) Isolate PCM2 in Thailand

  • Pornchai Anuntasomboon,
  • Suradej Siripattanapipong,
  • Sasimanas Unajak,
  • Kiattawee Choowongkomon,
  • Richard Burchmore,
  • Saovanee Leelayoova,
  • Mathirut Mungthin,
  • Teerasak E-kobon

DOI
https://doi.org/10.3390/biology11091272
Journal volume & issue
Vol. 11, no. 9
p. 1272

Abstract

Read online

Background: Leishmania orientalis (formerly named Leishmania siamensis) has been neglected for years in Thailand. The genomic study of L. orientalis has gained much attention recently after the release of the first high-quality reference genome of the isolate LSCM4. The integrative approach of multiple sequencing platforms for whole-genome sequencing has proven effective at the expense of considerably expensive costs. This study presents a preliminary bioinformatic workflow including the use of multi-step de novo assembly coupled with the reference-based assembly method to produce high-quality genomic drafts from the short-read Illumina sequence data of L. orientalis isolate PCM2. Results: The integrating multi-step de novo assembly by MEGAHIT and SPAdes with the reference-based method using the L. enriettii genome and salvaging the unmapped reads resulted in the 30.27 Mb genomic draft of L. orientalis isolate PCM2 with 3367 contigs and 8887 predicted genes. The results from the integrated approach showed the best integrity, coverage, and contig alignment when compared to the genome of L. orientalis isolate LSCM4 collected from the northern province of Thailand. Similar patterns of gene ratios and frequency were observed from the GO biological process annotation. Fifty GO terms were assigned to the assembled genomes, and 23 of these (accounting for 61.6% of the annotated genes) showed higher gene counts and ratios when results from our workflow were compared to those of the LSCM4 isolate. Conclusions: These results indicated that our proposed bioinformatic workflow produced an acceptable-quality genome of L. orientalis strain PCM2 for functional genomic analysis, maximising the usage of the short-read data. This workflow would give extensive information required for identifying strain-specific markers and virulence-associated genes useful for drug and vaccine development before a more exhaustive and expensive investigation.

Keywords