Genome Biology (Jan 2021)

Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly

  • Guillaume Holley,
  • Doruk Beyter,
  • Helga Ingimundardottir,
  • Peter L. Møller,
  • Snædis Kristmundsdottir,
  • Hannes P. Eggertsson,
  • Bjarni V. Halldorsson

DOI
https://doi.org/10.1186/s13059-020-02244-4
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 22

Abstract

Read online

Abstract A major challenge to long read sequencing data is their high error rate of up to 15%. We present Ratatosk, a method to correct long reads with short read data. We demonstrate on 5 human genome trios that Ratatosk reduces the error rate of long reads 6-fold on average with a median error rate as low as 0.22 %. SNP calls in Ratatosk corrected reads are nearly 99 % accurate and indel calls accuracy is increased by up to 37 %. An assembly of Ratatosk corrected reads from an Ashkenazi individual yields a contig N50 of 45 Mbp and less misassemblies than a PacBio HiFi reads assembly.