Heliyon (Oct 2024)
Predicting somatic mutation origins in cell-free DNA by semi-supervised GAN models
Abstract
Motivation: Distinguishing between pathogenic cancer-associated mutations and other somatic variants present in cell-free DNA (cfDNA) is one of the challenges in the field of liquid biopsy. This distinction is critical, since the misclassification of mutations stemming from clonal hematopoiesis (CH) as tumor-derived and vice versa could result in inaccurate diagnoses and inappropriate therapeutic interventions for patients. Results: We addressed this by developing a specialized machine learning technique to differentiate tumor- or CH-related mutations in cfDNA. We established a comprehensive in-house reference catalog, comprising approximately 25,000 single nucleotide variants (SNVs), each linked to either tumor or CH origin. This reference serves as a foundation for training a deep learning model, which is structured on the semi-supervised generative adversarial network (SSGAN) architecture. By analyzing genomic coordinates and nucleotide composition of cfDNA variants, our model attains 95 % area under the curve (AUC) in classifying uncharacterized variants as CH or tumor-derived. In conclusion, our research emphasizes the potential of genomic feature prediction, using cfDNA data, to stand as a robust alternative to conventional multi-analyte sequencing methods. This approach not only enhances the accuracy of distinguishing CH from tumor mutations in liquid biopsy data, but also highlights the potential of advanced data analysis techniques and machine learning in genomics and personalized medicine. Availability: https://github.com/FPalizban/SSGAN.