PLoS ONE (Jan 2024)

GFPrint™: A machine learning tool for transforming genetic data into clinical insights.

  • Guillermo Sanz-Martín,
  • Daniela Paula Migliore,
  • Pablo Gómez Del Campo,
  • José Del Castillo-Izquierdo,
  • Juan Manuel Domínguez

DOI
https://doi.org/10.1371/journal.pone.0311370
Journal volume & issue
Vol. 19, no. 11
p. e0311370

Abstract

Read online

The increasing availability of massive genetic sequencing data in the clinical setting has triggered the need for appropriate tools to help fully exploit the wealth of information these data possess. GFPrint™ is a proprietary streaming algorithm designed to meet that need. By extracting the most relevant functional features, GFPrint™ transforms high-dimensional, noisy genetic sequencing data into an embedded representation, allowing unsupervised models to create data clusters that can be re-mapped to the original clinical information. Ultimately, this allows the identification of genes and pathways relevant to disease onset and progression. GFPrint™ has been tested and validated using two cancer genomic datasets publicly available. Analysis of the TCGA dataset has identified panels of genes whose mutations appear to negatively influence survival in non-metastatic colorectal cancer (15 genes), epidermoid non-small cell lung cancer (167 genes) and pheochromocytoma (313 genes) patients. Likewise, analysis of the Broad Institute dataset has identified 75 genes involved in pathways related to extracellular matrix reorganization whose mutations appear to dictate a worse prognosis for breast cancer patients. GFPrint™ is accessible through a secure web portal and can be used in any therapeutic area where the genetic profile of patients influences disease evolution.