Genome Biology (Jun 2024)

Evaluation of somatic copy number variation detection by NGS technologies and bioinformatics tools on a hyper-diploid cancer genome

  • Daniall Masood,
  • Luyao Ren,
  • Cu Nguyen,
  • Francesco G. Brundu,
  • Lily Zheng,
  • Yongmei Zhao,
  • Erich Jaeger,
  • Yong Li,
  • Seong Won Cha,
  • Aaron Halpern,
  • Sean Truong,
  • Michael Virata,
  • Chunhua Yan,
  • Qingrong Chen,
  • Andy Pang,
  • Reyes Alberto,
  • Chunlin Xiao,
  • Zhaowei Yang,
  • Wanqiu Chen,
  • Charles Wang,
  • Frank Cross,
  • Severine Catreux,
  • Leming Shi,
  • Julia A. Beaver,
  • Wenming Xiao,
  • Daoud M. Meerzaman

DOI
https://doi.org/10.1186/s13059-024-03294-8
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Background Copy number variation (CNV) is a key genetic characteristic for cancer diagnostics and can be used as a biomarker for the selection of therapeutic treatments. Using data sets established in our previous study, we benchmark the performance of cancer CNV calling by six most recent and commonly used software tools on their detection accuracy, sensitivity, and reproducibility. In comparison to other orthogonal methods, such as microarray and Bionano, we also explore the consistency of CNV calling across different technologies on a challenging genome. Results While consistent results are observed for copy gain, loss, and loss of heterozygosity (LOH) calls across sequencing centers, CNV callers, and different technologies, variation of CNV calls are mostly affected by the determination of genome ploidy. Using consensus results from six CNV callers and confirmation from three orthogonal methods, we establish a high confident CNV call set for the reference cancer cell line (HCC1395). Conclusions NGS technologies and current bioinformatics tools can offer reliable results for detection of copy gain, loss, and LOH. However, when working with a hyper-diploid genome, some software tools can call excessive copy gain or loss due to inaccurate assessment of genome ploidy. With performance matrices on various experimental conditions, this study raises awareness within the cancer research community for the selection of sequencing platforms, sample preparation, sequencing coverage, and the choice of CNV detection tools.

Keywords