IEEE Access (Jan 2023)

BinChill: A Metagenomic Binning Ensemble Method

  • Oliver S. Bak,
  • Marcus D. Jensen,
  • Frederik M. Trudslev,
  • Andreas Windfeld,
  • Andre Lamurias

DOI
https://doi.org/10.1109/ACCESS.2023.3277755
Journal volume & issue
Vol. 11
pp. 49561 – 49577

Abstract

Read online

The goal of metagenomic binning is to reconstruct genomes from a mixture of DNA sequences into genomic bins, which can be considered a clustering task. Multiple methods have been proposed for this task, such as distance-based metrics, machine learning, and ensemble approaches. We propose BinChill, a metagenomic ensemble method, based on the generic co-occurrence ensembler method, ACE. BinChill incorporates domain information in the form of Single-Copy Genes (SCG) with a co-occurrence strategy. This strategy combines multiple clustering partitions according to how often two items co-occur in the same cluster. BinChill was able to reconstruct more or equally as many high- and medium quality while having an equal or faster runtime than other metagenomics-specific methods on a smaller simulated dataset. On larger datasets, both simulated and real-world, BinChill outperformed other methods in reconstructing high-quality bins, at the cost of an increased processing time when compared to generic ensemble clustering algorithms. This is due to the domain-specific steps that our method implements. Our results show that the strengths of multiple partitions can be combined to generate a partition of higher quality.

Keywords