BMC Genomics (May 2018)

Towards pan-genome read alignment to improve variation calling

  • Daniel Valenzuela,
  • Tuukka Norri,
  • Niko Välimäki,
  • Esa Pitkänen,
  • Veli Mäkinen

DOI
https://doi.org/10.1186/s12864-018-4465-8
Journal volume & issue
Vol. 19, no. S2
pp. 123 – 130

Abstract

Read online

Abstract Background Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. Results We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation – a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. Conclusions Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.

Keywords