Genome Biology (Jan 2022)

Achieving robust somatic mutation detection with deep learning models derived from reference data sets of a cancer sample

  • Sayed Mohammad Ebrahim Sahraeian,
  • Li Tai Fang,
  • Konstantinos Karagiannis,
  • Malcolm Moos,
  • Sean Smith,
  • Luis Santana-Quintero,
  • Chunlin Xiao,
  • Michael Colgan,
  • Huixiao Hong,
  • Marghoob Mohiyuddin,
  • Wenming Xiao

DOI
https://doi.org/10.1186/s13059-021-02592-9
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 20

Abstract

Read online

Abstract Background Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. Results In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. Conclusions The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions

Keywords