Computational and Structural Biotechnology Journal (Jan 2020)

Comparison of unsupervised machine-learning methods to identify metabolomic signatures in patients with localized breast cancer

  • Jocelyn Gal,
  • Caroline Bailleux,
  • David Chardin,
  • Thierry Pourcher,
  • Julia Gilhodes,
  • Lun Jing,
  • Jean-Marie Guigonis,
  • Jean-Marc Ferrero,
  • Gerard Milano,
  • Baharia Mograbi,
  • Patrick Brest,
  • Yann Chateau,
  • Olivier Humbert,
  • Emmanuel Chamorey

Journal volume & issue
Vol. 18
pp. 1509 – 1524

Abstract

Read online

Genomics and transcriptomics have led to the widely-used molecular classification of breast cancer (BC). However, heterogeneous biological behaviors persist within breast cancer subtypes. Metabolomics is a rapidly-expanding field of study dedicated to cellular metabolisms affected by the environment. The aim of this study was to compare metabolomic signatures of BC obtained by 5 different unsupervised machine learning (ML) methods. Fifty-two consecutive patients with BC with an indication for adjuvant chemotherapy between 2013 and 2016 were retrospectively included. We performed metabolomic profiling of tumor resection samples using liquid chromatography-mass spectrometry. Here, four hundred and forty-nine identified metabolites were selected for further analysis. Clusters obtained using 5 unsupervised ML methods (PCA k-means, sparse k-means, spectral clustering, SIMLR and k-sparse) were compared in terms of clinical and biological characteristics. With an optimal partitioning parameter k = 3, the five methods identified three prognosis groups of patients (favorable, intermediate, unfavorable) with different clinical and biological profiles. SIMLR and K-sparse methods were the most effective techniques in terms of clustering. In-silico survival analysis revealed a significant difference for 5-year predicted OS between the 3 clusters. Further pathway analysis using the 449 selected metabolites showed significant differences in amino acid and glucose metabolism between BC histologic subtypes. Our results provide proof-of-concept for the use of unsupervised ML metabolomics enabling stratification and personalized management of BC patients. The design of novel computational methods incorporating ML and bioinformatics techniques should make available tools particularly suited to improving the outcome of cancer treatment and reducing cancer-related mortalities.

Keywords