Frontiers in Genetics (Nov 2022)

TLsub: A transfer learning based enhancement to accurately detect mutations with wide-spectrum sub-clonal proportion

  • Tian Zheng,
  • Tian Zheng

DOI
https://doi.org/10.3389/fgene.2022.981269
Journal volume & issue
Vol. 13

Abstract

Read online

Mutation detecting is a routine work for sequencing data analysis and the trading of existing tools often involves the combinations of signals on a set of overlapped sequencing reads. However, the subclonal mutations, which are reported to contribute to tumor recurrence and metastasis, are sometimes eliminated by existing signals. When the clonal proportion decreases, signals often present ambiguous, while complicated interactions among signals break the IID assumption for most of the machine learning models. Although the mutation callers could lower the thresholds, false positives are significantly introduced. The main aim here was to detect the subclonal mutations with high specificity from the scenario of ambiguous sample purities or clonal proportions. We proposed a novel machine learning approach for filtering false positive calls to accurately detect mutations with wide spectrum subclonal proportion. We have carried out a series of experiments on both simulated and real datasets, and compared to several state-of-art approaches, including freebayes, MuTect2, Sentieon and SiNVICT. The results demonstrated that the proposed method adapts well to different diluted sequencing signals and can significantly reduce the false positive when detecting subclonal mutations. The codes have been uploaded at https://github.com/TrinaZ/TL-fpFilter for academic usage only.

Keywords