PeerJ (May 2021)

ToRQuEMaDA: tool for retrieving queried Eubacteria, metadata and dereplicating assemblies

  • Raphaël R. Léonard,
  • Marie Leleu,
  • Mick Van Vlierberghe,
  • Luc Cornet,
  • Frédéric Kerff,
  • Denis Baurain

DOI
https://doi.org/10.7717/peerj.11348
Journal volume & issue
Vol. 9
p. e11348

Abstract

Read online Read online

TQMD is a tool for high-performance computing clusters which downloads, stores and produces lists of dereplicated prokaryotic genomes. It has been developed to counter the ever-growing number of prokaryotic genomes and their uneven taxonomic distribution. It is based on word-based alignment-free methods (k-mers), an iterative single-linkage approach and a divide-and-conquer strategy to remain both efficient and scalable. We studied the performance of TQMD by verifying the influence of its parameters and heuristics on the clustering outcome. We further compared TQMD to two other dereplication tools (dRep and Assembly-Dereplicator). Our results showed that TQMD is primarily optimized to dereplicate at higher taxonomic levels (phylum/class), as opposed to the other dereplication tools, but also works at lower taxonomic levels (species/strain) like the other dereplication tools. TQMD is available from source and as a Singularity container at [https://bitbucket.org/phylogeno/tqmd ].

Keywords