BMC Bioinformatics (Oct 2018)
HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations
Abstract
Abstract Background Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. Results HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.’s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. Conclusion HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.’s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.
Keywords