BMC Genomics (Dec 2017)

Single molecule sequencing-guided scaffolding and correction of draft assemblies

  • Shenglong Zhu,
  • Danny Z. Chen,
  • Scott J. Emrich

DOI
https://doi.org/10.1186/s12864-017-4271-8
Journal volume & issue
Vol. 18, no. S10
pp. 51 – 59

Abstract

Read online

Abstract Background Although single molecule sequencing is still improving, the lengths of the generated sequences are inevitably an advantage in genome assembly. Prior work that utilizes long reads to conduct genome assembly has mostly focused on correcting sequencing errors and improving contiguity of de novo assemblies. Results We propose a disassembling-reassembling approach for both correcting structural errors in the draft assembly and scaffolding a target assembly based on error-corrected single molecule sequences. To achieve this goal, we formulate a maximum alternating path cover problem. We prove that this problem is NP-hard, and solve it by a 2-approximation algorithm. Conclusions Our experimental results show that our approach can improve the structural correctness of target assemblies in the cost of some contiguity, even with smaller amounts of long reads. In addition, our reassembling process can also serve as a competitive scaffolder relative to well-established assembly benchmarks.

Keywords