PLoS Computational Biology (Sep 2021)

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes.

  • Thomas K F Wong,
  • Teng Li,
  • Louis Ranjard,
  • Steven H Wu,
  • Jeet Sukumaran,
  • Allen G Rodrigo

DOI
https://doi.org/10.1371/journal.pcbi.1008949
Journal volume & issue
Vol. 17, no. 9
p. e1008949

Abstract

Read online

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.