Naučno-tehničeskij Vestnik Informacionnyh Tehnologij, Mehaniki i Optiki (Jan 2016)

ERROR CORRECTION METHOD FOR SEQUENCING DATA WITH INSERTIONS AND DELETIONS

  • A. V. Alexandrov,
  • A. A. Shalyto

DOI
https://doi.org/10.17586/2226-1494-2016-16-1-108-114
Journal volume & issue
Vol. 16, no. 1
pp. 108 – 114

Abstract

Read online

Subject of Research.A method for error correction for sequencing reads of a haploid organism with insertions and deletions was developed. It was tested on two libraries: a synthesized dataset for Escherichia coli bacterium and a real dataset of reads for Pseudomonas stutzeri. Method. The method is based on using k-mers but only for finding reads that are close to each other. For the close reads a consensus string is created which is then used for correcting errors in the initial reads. Main Results. The algorithm is implemented as a separated program. The program has been tested on both real and synthesized data. The method performance is higher than that of the other known methods (N50 metric was used as well as total contig length and maximal contig length as metrics for comparison). Practical Relevance. The method can be used together with known genome assembly methods not suitable for application with the reads containing insertion and deletion errors.

Keywords