Scientific Reports (Apr 2024)

Towards accurate indel calling for oncopanel sequencing through an international pipeline competition at precisionFDA

  • Binsheng Gong,
  • Samir Lababidi,
  • Rebecca Kusko,
  • Khaled Bouri,
  • Sarah Prezek,
  • Vishal Thovarai,
  • Anish Prasanna,
  • Ezekiel J. Maier,
  • Mahdi Golkaram,
  • Xingqiang Sun,
  • Konstantinos Kyriakidis,
  • João Paulo Kitajima,
  • Sayed Mohammad Ebrahim Sahraeian,
  • Yunfei Guo,
  • Elaine Johanson,
  • Wendell Jones,
  • Weida Tong,
  • Joshua Xu

DOI
https://doi.org/10.1038/s41598-024-58573-y
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Accurately calling indels with next-generation sequencing (NGS) data is critical for clinical application. The precisionFDA team collaborated with the U.S. Food and Drug Administration’s (FDA’s) National Center for Toxicological Research (NCTR) and successfully completed the NCTR Indel Calling from Oncopanel Sequencing Data Challenge, to evaluate the performance of indel calling pipelines. Top performers were selected based on precision, recall, and F1-score. The performance of many other pipelines was close to the top performers, which produced a top cluster of performers. The performance was significantly higher in high confidence regions and coding regions, and significantly lower in low complexity regions. Oncopanel capture and other issues may have occurred that affected the recall rate. Indels with higher variant allele frequency (VAF) may generally be called with higher confidence. Many of the indel calling pipelines had good performance. Some of them performed generally well across all three oncopanels, while others were better for a specific oncopanel. The performance of indel calling can further be improved by restricting the calls within high confidence intervals (HCIs) and coding regions, and by excluding low complexity regions (LCR) regions. Certain VAF cut-offs could be applied according to the applications.

Keywords