BMC Bioinformatics (Sep 2018)

DM-PhyClus: a Bayesian phylogenetic algorithm for infectious disease transmission cluster inference

  • Luc Villandré,
  • Aurélie Labbe,
  • Bluma Brenner,
  • Michel Roger,
  • David A Stephens

DOI
https://doi.org/10.1186/s12859-018-2347-3
Journal volume & issue
Vol. 19, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Background Conventional phylogenetic clustering approaches rely on arbitrary cutpoints applied a posteriori to phylogenetic estimates. Although in practice, Bayesian and bootstrap-based clustering tend to lead to similar estimates, they often produce conflicting measures of confidence in clusters. The current study proposes a new Bayesian phylogenetic clustering algorithm, which we refer to as DM-PhyClus (Dirichlet-Multinomial Phylogenetic Clustering), that identifies sets of sequences resulting from quick transmission chains, thus yielding easily-interpretable clusters, without using any ad hoc distance or confidence requirement. Results Simulations reveal that DM-PhyClus can outperform conventional clustering methods, as well as the Gap procedure, a pure distance-based algorithm, in terms of mean cluster recovery. We apply DM-PhyClus to a sample of real HIV-1 sequences, producing a set of clusters whose inference is in line with the conclusions of a previous thorough analysis. Conclusions DM-PhyClus, by eliminating the need for cutpoints and producing sensible inference for cluster configurations, can facilitate transmission cluster detection. Future efforts to reduce incidence of infectious diseases, like HIV-1, will need reliable estimates of transmission clusters. It follows that algorithms like DM-PhyClus could serve to better inform public health strategies.

Keywords