Frontiers in Marine Science (Jan 2023)

Symbiont-screener: A reference-free tool to separate host sequences from symbionts for error-prone long reads

  • Mengyang Xu,
  • Mengyang Xu,
  • Lidong Guo,
  • Lidong Guo,
  • Yanwei Qi,
  • Chengcheng Shi,
  • Xiaochuan Liu,
  • Jianwei Chen,
  • Jinglin Han,
  • Li Deng,
  • Xin Liu,
  • Xin Liu,
  • Xin Liu,
  • Guangyi Fan,
  • Guangyi Fan,
  • Guangyi Fan

DOI
https://doi.org/10.3389/fmars.2023.1087447
Journal volume & issue
Vol. 10

Abstract

Read online

Metagenomic sequencing facilitates large-scale constitutional analysis and functional characterization of complex microbial communities without cultivation. Recent advances in long-read sequencing techniques utilize long-range information to simplify repeat-aware metagenomic assembly puzzles and complex genome binning tasks. However, it remains methodologically challenging to remove host-derived DNA sequences from the microbial community at the read resolution due to high sequencing error rates and the absence of reference genomes. We here present Symbiont-Screener (https://github.com/BGI-Qingdao/Symbiont-Screener), a reference-free approach to identifying high-confidence host’s long reads from symbionts and contaminants and overcoming the low sequencing accuracy according to a trio-based screening model. The remaining host’s sequences are then automatically grouped by unsupervised clustering. When applied to both simulated and real long-read datasets, it maintains higher precision and recall rates of identifying the host’s raw reads compared to other tools and hence promises the high-quality reconstruction of the host genome and associated metagenomes. Furthermore, we leveraged both PacBio HiFi and nanopore long reads to separate the host’s sequences on a real host-microbe system, an algal-bacterial sample, and retrieved an obvious improvement of host assembly in terms of assembly contiguity, completeness, and purity. More importantly, the residual symbiotic microbiomes illustrate improved genomic profiling and assemblies after the screening, which elucidates a solid basis of data for downstream bioinformatic analyses, thus providing a novel perspective on symbiotic research.

Keywords