BMC Bioinformatics (Apr 2021)

Low-level variant calling for non-matched samples using a position-based and nucleotide-specific approach

  • Jeffrey N. Dudley,
  • Celine S. Hong,
  • Marwan A. Hawari,
  • Jasmine Shwetar,
  • Julie C. Sapp,
  • Justin Lack,
  • Henoke Shiferaw,
  • NISC Comparative Sequencing Program,
  • Jennifer J. Johnston,
  • Leslie G. Biesecker

DOI
https://doi.org/10.1186/s12859-021-04090-y
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background The widespread use of next-generation sequencing has identified an important role for somatic mosaicism in many diseases. However, detecting low-level mosaic variants from next-generation sequencing data remains challenging. Results Here, we present a method for Position-Based Variant Identification (PBVI) that uses empirically-derived distributions of alternate nucleotides from a control dataset. We modeled this approach on 11 segmental overgrowth genes. We show that this method improves detection of single nucleotide mosaic variants of 0.01–0.05 variant allele fraction compared to other low-level variant callers. At depths of 600 × and 1200 ×, we observed > 85% and > 95% sensitivity, respectively. In a cohort of 26 individuals with somatic overgrowth disorders PBVI showed improved signal to noise, identifying pathogenic variants in 17 individuals. Conclusion PBVI can facilitate identification of low-level mosaic variants thus increasing the utility of next-generation sequencing data for research and diagnostic purposes.

Keywords