BMC Genomics (Jun 2022)

Manipulating base quality scores enables variant calling from bisulfite sequencing alignments using conventional bayesian approaches

  • Adam Nunn,
  • Christian Otto,
  • Mario Fasold,
  • Peter F Stadler,
  • David Langenberger

DOI
https://doi.org/10.1186/s12864-022-08691-6
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Calling germline SNP variants from bisulfite-converted sequencing data poses a challenge for conventional software, which have no inherent capability to dissociate true polymorphisms from artificial mutations induced by the chemical treatment. Nevertheless, SNP data is desirable both for genotyping and to understand the DNA methylome in the context of the genetic background. The confounding effect of bisulfite conversion however can be conceptually resolved by observing differences in allele counts on a per-strand basis, whereby artificial mutations are reflected by non-complementary base pairs. Results Herein, we present a computational pre-processing approach for adapting sequence alignment data, thus indirectly enabling downstream analysis on a per-strand basis using conventional variant calling software such as GATK or Freebayes. In comparison to specialised tools, the method represents a marked improvement in precision-sensitivity based on high-quality, published benchmark datasets for both human and model plant variants. Conclusion The presented “double-masking” procedure represents an open source, easy-to-use method to facilitate accurate variant calling using conventional software, thus negating any dependency on specialised tools and mitigating the need to generate additional, conventional sequencing libraries alongside bisulfite sequencing experiments. The method is available at https://github.com/bio15anu/revelio and an implementation with Freebayes is available at https://github.com/EpiDiverse/SNP

Keywords