NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling [version 2; peer review: 2 approved]

Felix Wiegand; Bianca Stöcker; Famke Bäuerle; Susanne Motameny; Andreas Buness; Alexander J. Probst; Fabian Brand; Axel Schmidt; Tyll Stöcker; Sugirthan Sivalingam; Andreas Petzold; Marc Sturm; Janine Altmueller; Johannes Köster; Kerstin Becker; Leon Brandhoff; Anna Ossowski; Christian Mertes; Avirup Guha Neogi; Gisela Gabernet; Nicholas H. Smith; Friederike Hanssen

F1000Research (Sep 2024)

NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling [version 2; peer review: 2 approved]

Felix Wiegand,
Bianca Stöcker,
Famke Bäuerle,
Susanne Motameny,
Andreas Buness,
Alexander J. Probst,
Fabian Brand,
Axel Schmidt,
Tyll Stöcker,
Sugirthan Sivalingam,
Andreas Petzold,
Marc Sturm,
Janine Altmueller,
Johannes Köster,
Kerstin Becker,
Leon Brandhoff,
Anna Ossowski,
Christian Mertes,
Avirup Guha Neogi,
Gisela Gabernet,
Nicholas H. Smith,
Friederike Hanssen

Affiliations

Felix Wiegand: Bioinformatics and Computational Oncology, Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, University of Duisburg-Essen, Essen, Germany
Bianca Stöcker: Bioinformatics and Computational Oncology, Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, University of Duisburg-Essen, Essen, Germany
Famke Bäuerle: Quantitative Biology Center, Eberhard Karls University Tübingen, Tübingen, Germany
Susanne Motameny: ORCiD; Cologne Center for Genomics, University of Cologne, Cologne, Germany
Andreas Buness: Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany
Alexander J. Probst: Environmental Metagenomics, Research Center One Health Ruhr, University Alliance Ruhr, Faculty of Chemistry, University of Duisburg-Essen, Essen, Germany
Fabian Brand: Institute for Genomic Statistics and Bioinformatics, Medical Faculty, University of Bonn, Bonn, Germany
Axel Schmidt: Institute of Human Genetics, University Hospital of Bonn, Bonn, Germany
Tyll Stöcker: Institute of Crop Science and Resource Conservation, University of Bonn, Bonn, Germany
Sugirthan Sivalingam: Institute of Human Genetics, Medical Faculty and University Hospital Düsseldorf, Heinrich-Heine-University Düsseldorf, Düsseldorf, Germany
Andreas Petzold: DRESDEN-concept Genome Center, TUD Dresden University of Technology, Dresden, Germany
Marc Sturm: Institute of Medical Genetics and Applied Genomics, University Hospital Tuebingen, Tübingen, Germany
Janine Altmueller: Cologne Center for Genomics, University of Cologne, Cologne, Germany
Johannes Köster: ORCiD; Bioinformatics and Computational Oncology, Institute for Artificial Intelligence in Medicine (IKIM), University Medicine Essen, University of Duisburg-Essen, Essen, Germany
Kerstin Becker: Cologne Center for Genomics, University of Cologne, Cologne, Germany
Leon Brandhoff: Cologne Center for Genomics, University of Cologne, Cologne, Germany
Anna Ossowski: Cologne Center for Genomics, University of Cologne, Cologne, Germany
Christian Mertes: TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
Avirup Guha Neogi: Cologne Center for Genomics, University of Cologne, Cologne, Germany
Gisela Gabernet: Quantitative Biology Center, Eberhard Karls University Tübingen, Tübingen, Germany
Nicholas H. Smith: ORCiD; TUM School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
Friederike Hanssen: Quantitative Biology Center, Eberhard Karls University Tübingen, Tübingen, Germany

Journal volume & issue: Vol. 12

Abstract

Read online

We present the results of the human genomic small variant calling benchmarking initiative of the German Research Foundation (DFG) funded Next Generation Sequencing Competence Network (NGS-CN) and the German Human Genome-Phenome Archive (GHGA). In this effort, we developed NCBench, a continuous benchmarking platform for the evaluation of small genomic variant callsets in terms of recall, precision, and false positive/negative error patterns. NCBench is implemented as a continuously re-evaluated open-source repository. We show that it is possible to entirely rely on public free infrastructure (Github, Github Actions, Zenodo) in combination with established open-source tools. NCBench is agnostic of the used dataset and can evaluate an arbitrary number of given callsets, while reporting the results in a visual and interactive way. We used NCBench to evaluate over 40 callsets generated by various variant calling pipelines available in the participating groups that were run on three exome datasets from different enrichment kits and at different coverages. While all pipelines achieve high overall quality, subtle systematic differences between callers and datasets exist and are made apparent by NCBench.These insights are useful to improve existing pipelines and develop new workflows. NCBench is meant to be open for the contribution of any given callset. Most importantly, for authors, it will enable the omission of repeated re-implementation of paper-specific variant calling benchmarks for the publication of new tools or pipelines, while readers will benefit from being able to (continuously) observe the performance of tools and pipelines at the time of reading instead of at the time of writing.

Published in F1000Research

ISSN: 2046-1402 (Online)
Publisher: F1000 Research Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://f1000research.com

About the journal

Abstract

Keywords