BMC Bioinformatics (Jul 2022)

Algorithmic improvements for discovery of germline copy number variants in next-generation sequencing data

  • Brendan O’Fallon,
  • Jacob Durtschi,
  • Ana Kellogg,
  • Tracey Lewis,
  • Devin Close,
  • Hunter Best

DOI
https://doi.org/10.1186/s12859-022-04820-w
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background Copy number variants (CNVs) play a significant role in human heredity and disease. However, sensitive and specific characterization of germline CNVs from NGS data has remained challenging, particularly for hybridization-capture data in which read counts are the primary source of copy number information. Results We describe two algorithmic adaptations that improve CNV detection accuracy in a Hidden Markov Model (HMM) context. First, we present a method for computing target- and copy number-specific emission distributions. Second, we demonstrate that the Pointwise Maximum a posteriori (PMAP) HMM decoding procedure yields improved sensitivity for small CNV calls compared to the more common Viterbi HMM decoder. We develop a prototype implementation, called Cobalt, and compare it to other CNV detection tools using sets of simulated and previously detected CNVs with sizes spanning a single exon to a full chromosome. Conclusions In both the simulation and previously detected CNV studies Cobalt shows similar sensitivity but significantly fewer false positive detections compared to other callers. Overall sensitivity is 80–90% for deletion CNVs spanning 1–4 targets and 90–100% for larger deletion events, while sensitivity is somewhat lower for small duplication CNVs.

Keywords