SOAPdenovo2: an empirically improved memory-efficient short-read <it>de novo</it> assembler

Luo Ruibang; Liu Binghang; Xie Yinlong; Li Zhenyu; Huang Weihua; Yuan Jianying; He Guangzhu; Chen Yanxiang; Pan Qi; Liu Yunjie; Tang Jingbo; Wu Gengxiong; Zhang Hao; Shi Yujian; Liu Yong; Yu Chang; Wang Bo; Lu Yao; Han Changlei; Cheung David W; Yiu Siu-Ming; Peng Shaoliang; Xiaoqian Zhu; Liu Guangming; Liao Xiangke; Li Yingrui; Yang Huanming; Wang Jian; Lam Tak-Wah; Wang Jun

doi:10.1186/2047-217X-1-18

GigaScience (Dec 2012)

SOAPdenovo2: an empirically improved memory-efficient short-read <it>de novo</it> assembler

Luo Ruibang,
Liu Binghang,
Xie Yinlong,
Li Zhenyu,
Huang Weihua,
Yuan Jianying,
He Guangzhu,
Chen Yanxiang,
Pan Qi,
Liu Yunjie,
Tang Jingbo,
Wu Gengxiong,
Zhang Hao,
Shi Yujian,
Liu Yong,
Yu Chang,
Wang Bo,
Lu Yao,
Han Changlei,
Cheung David W,
Yiu Siu-Ming,
Peng Shaoliang,
Xiaoqian Zhu,
Liu Guangming,
Liao Xiangke,
Li Yingrui,
Yang Huanming,
Wang Jian,
Lam Tak-Wah,
Wang Jun

Affiliations

Luo Ruibang
Liu Binghang
Xie Yinlong
Li Zhenyu
Huang Weihua
Yuan Jianying
He Guangzhu
Chen Yanxiang
Pan Qi
Liu Yunjie
Tang Jingbo
Wu Gengxiong
Zhang Hao
Shi Yujian
Liu Yong
Yu Chang
Wang Bo
Lu Yao
Han Changlei
Cheung David W
Yiu Siu-Ming
Peng Shaoliang
Xiaoqian Zhu
Liu Guangming
Liao Xiangke
Li Yingrui
Yang Huanming
Wang Jian
Lam Tak-Wah
Wang Jun

DOI: https://doi.org/10.1186/2047-217X-1-18
Journal volume & issue: Vol. 1, no. 1
p. 18

Abstract

Read online

Abstract Background There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. Findings To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. Conclusions Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

Published in GigaScience

ISSN: 2047-217X (Online)
Publisher: Oxford University Press
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://academic.oup.com/gigascience

About the journal

Abstract

Keywords