Nature Communications (Nov 2024)

Synthetic augmentation of cancer cell line multi-omic datasets using unsupervised deep learning

  • Zhaoxiang Cai,
  • Sofia Apolinário,
  • Ana R. Baião,
  • Clare Pacini,
  • Miguel D. Sousa,
  • Susana Vinga,
  • Roger R. Reddel,
  • Phillip J. Robinson,
  • Mathew J. Garnett,
  • Qing Zhong,
  • Emanuel Gonçalves

DOI
https://doi.org/10.1038/s41467-024-54771-4
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Integrating diverse types of biological data is essential for a holistic understanding of cancer biology, yet it remains challenging due to data heterogeneity, complexity, and sparsity. Addressing this, our study introduces an unsupervised deep learning model, MOSA (Multi-Omic Synthetic Augmentation), specifically designed to integrate and augment the Cancer Dependency Map (DepMap). Harnessing orthogonal multi-omic information, this model successfully generates molecular and phenotypic profiles, resulting in an increase of 32.7% in the number of multi-omic profiles and thereby generating a complete DepMap for 1523 cancer cell lines. The synthetically enhanced data increases statistical power, uncovering less studied mechanisms associated with drug resistance, and refines the identification of genetic associations and clustering of cancer cell lines. By applying SHapley Additive exPlanations (SHAP) for model interpretation, MOSA reveals multi-omic features essential for cell clustering and biomarker identification related to drug and gene dependencies. This understanding is crucial for developing much-needed effective strategies to prioritize cancer targets.