PLoS ONE (Jun 2010)

De Novo assembly of the complete genome of an enhanced electricity-producing variant of Geobacter sulfurreducens using only short reads.

  • Harish Nagarajan,
  • Jessica E Butler,
  • Anna Klimes,
  • Yu Qiu,
  • Karsten Zengler,
  • Joy Ward,
  • Nelson D Young,
  • Barbara A Methé,
  • Bernhard Ø Palsson,
  • Derek R Lovley,
  • Christian L Barrett

DOI
https://doi.org/10.1371/journal.pone.0010922
Journal volume & issue
Vol. 5, no. 6
p. e10922

Abstract

Read online

State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.