Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

Stephan Weißbach; Stanislav Sys; Charlotte Hewel; Hristo Todorov; Susann Schweiger; Jennifer Winter; Markus Pfenninger; Ali Torkamani; Doug Evans; Joachim Burger; Karin Everschor-Sitte; Helen Louise May-Simera; Susanne Gerber

doi:10.1186/s12864-020-07362-8

BMC Genomics (Jan 2021)

Reliability of genomic variants across different next-generation sequencing platforms and bioinformatic processing pipelines

Stephan Weißbach,
Stanislav Sys,
Charlotte Hewel,
Hristo Todorov,
Susann Schweiger,
Jennifer Winter,
Markus Pfenninger,
Ali Torkamani,
Doug Evans,
Joachim Burger,
Karin Everschor-Sitte,
Helen Louise May-Simera,
Susanne Gerber

Affiliations

Stephan Weißbach: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Stanislav Sys: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Charlotte Hewel: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Hristo Todorov: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Susann Schweiger: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Jennifer Winter: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz
Markus Pfenninger: Department of Molecular Ecology, Senckenberg Biodiversity and Climate Research Centre
Ali Torkamani: Department of Integrative Structural and Computational Biology, Scripps Research Translational Institute, California Campus
Doug Evans: Department of Integrative Structural and Computational Biology, Scripps Research Translational Institute, California Campus
Joachim Burger: Institute of Anthropology, Johannes Gutenberg-University Mainz
Karin Everschor-Sitte: Institute of Physics, Johannes Gutenberg-University Mainz
Helen Louise May-Simera: Institute of Molecular Physiology, Johannes Gutenberg-University Mainz
Susanne Gerber: Institute of Human Genetics, University Medical Center of the Johannes Gutenberg-University Mainz

DOI: https://doi.org/10.1186/s12864-020-07362-8
Journal volume & issue: Vol. 22, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Next Generation Sequencing (NGS) is the fundament of various studies, providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors in single nucleotide variant calls, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq, and (iii) Complete Genomics and processed with the respective bioinformatic pipeline. We also repeated variant calling for the Illumina cohorts with GATK, which allowed us to investigate the effect of the bioinformatics analysis strategy separately from the sequencing platform’s impact. Results The number of detected variants/variant classes per individual was highly dependent on the experimental setup. We observed a statistically significant overrepresentation of variants uniquely called by a single setup, indicating potential systematic biases. Insertion/deletion polymorphisms (indels) were associated with decreased concordance compared to single nucleotide polymorphisms (SNPs). The discrepancies in indel absolute numbers were particularly prominent in introns, Alu elements, simple repeats, and regions with medium GC content. Notably, reprocessing sequencing data following the best practice recommendations of GATK considerably improved concordance between the respective setups. Conclusion We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Furthermore, our results demonstrate the benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.

Published in BMC Genomics

ISSN: 1471-2164 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Biology (General): Genetics
Website: http://bmcgenomics.biomedcentral.com

About the journal

Abstract

Keywords