IEEE Access (Jan 2022)

A Deep Embedded Clustering Algorithm for the Binning of Metagenomic Sequences

  • Huynh Quang Bao,
  • Le Van Vinh,
  • Tran Van Hoai

DOI
https://doi.org/10.1109/ACCESS.2022.3176954
Journal volume & issue
Vol. 10
pp. 54348 – 54357

Abstract

Read online

The study of metagenomic sequences brings a deep understanding of microbial communities. One of the crucial steps in metagenomic projects is to classify sequences into different organisms, named the binning problem. In the emerging methods for classification, deep learning is a potential technology to be applicable with high accuracy. However, it is well-known that reference databases, which are highly required by deep learning based methods, are not always available. As a result, some existing binning solutions have applied unsupervised learning processes, but utilizing the strength of deep learning in an unsupervised model is still a challenging problem. This work proposes a binning algorithm for metagenomic sequences, called MetaDEC, which applies a deep unsupervised learning approach. By following the two-phase paradigm, the algorithm firstly divides sequences into groups of overlapping sequences. The groups are then classified into clusters using an adversarial deep embedded clustering technique. Experimental results show that MetaDEC achieves competitive performance compared to existing methods on both simulated and real metagenomic data.

Keywords