PeerJ (Sep 2023)

Integrating chromatin conformation information in a self-supervised learning model improves metagenome binning

  • Harrison Ho,
  • Mansi Chovatia,
  • Rob Egan,
  • Guifen He,
  • Yuko Yoshinaga,
  • Ivan Liachko,
  • Ronan O’Malley,
  • Zhong Wang

DOI
https://doi.org/10.7717/peerj.16129
Journal volume & issue
Vol. 11
p. e16129

Abstract

Read online Read online

Metagenome binning is a key step, downstream of metagenome assembly, to group scaffolds by their genome of origin. Although accurate binning has been achieved on datasets containing multiple samples from the same community, the completeness of binning is often low in datasets with a small number of samples due to a lack of robust species co-abundance information. In this study, we exploited the chromatin conformation information obtained from Hi-C sequencing and developed a new reference-independent algorithm, Metagenome Binning with Abundance and Tetra-nucleotide frequencies—Long Range (metaBAT-LR), to improve the binning completeness of these datasets. This self-supervised algorithm builds a model from a set of high-quality genome bins to predict scaffold pairs that are likely to be derived from the same genome. Then, it applies these predictions to merge incomplete genome bins, as well as recruit unbinned scaffolds. We validated metaBAT-LR’s ability to bin-merge and recruit scaffolds on both synthetic and real-world metagenome datasets of varying complexity. Benchmarking against similar software tools suggests that metaBAT-LR uncovers unique bins that were missed by all other methods. MetaBAT-LR is open-source and is available at https://bitbucket.org/project-metabat/metabat-lr.

Keywords