mBio (May 2025)
Eukfinder: a pipeline to retrieve microbial eukaryote genome sequences from metagenomic data
Abstract
ABSTRACT Whole-genome shotgun (WGS) metagenomic sequencing of microbial communities enables the discovery of the functions, physiologies, and evolutionary histories of prokaryotic and eukaryotic microbes. However, metagenomic studies of microbial eukaryotes lag due to challenges in identifying and assembling high-quality genomes from WGS data. To address this problem, we developed Eukfinder, a bioinformatics pipeline that identifies potential eukaryotic sequences from WGS metagenomic data, with a complementary binning workflow for recovering nuclear and mitochondrial genomes. Eukfinder uses two specialized databases for read/contig classification, customizable to specific data sets or environments. We tested Eukfinder on simulated gut microbiome data sets which included varying numbers of reads from the protist Blastocystis, a human gut commensal. We also applied Eukfinder to previously published human gut microbiome WGS metagenomic data to recover new genomes of Blastocystis. Compared to other workflows, Eukfinder offers the potential to recover high-quality, near-complete genomes of diverse eukaryotes, including different Blastocystis subtypes, without relying on a reference genome. With sufficient sequencing depth, Eukfinder outperforms similar tools for recovering eukaryotic genomes from metagenomic data. Eukfinder is a valuable tool for reference-independent and cultivation-free studies of eukaryotic microbial genomes from environmental WGS metagenomic samples.IMPORTANCEAdvancements in next-generation sequencing have made whole-genome shotgun (WGS) metagenomic sequencing an efficient method for de novo reconstruction of microbial genomes from various environments. Thousands of new prokaryotic genomes have been characterized; however, the large size and complexity of protistan genomes have hindered the use of WGS metagenomics to sample microbial eukaryotic diversity. Eukfinder enables the recovery of eukaryotic microbial genomes from environmental WGS metagenomic samples. Retrieval of high-quality protistan genomes from diverse metagenomic samples increases the number of reference genomes available. This aids future metagenomic investigations into the functions, physiologies, and evolutionary histories of eukaryotic microbes in the gut microbiome and other ecosystems.
Keywords