Nature Communications (Nov 2024)

Deep generative AI models analyzing circulating orphan non-coding RNAs enable detection of early-stage lung cancer

  • Mehran Karimzadeh,
  • Amir Momen-Roknabadi,
  • Taylor B. Cavazos,
  • Yuqi Fang,
  • Nae-Chyun Chen,
  • Michael Multhaup,
  • Jennifer Yen,
  • Jeremy Ku,
  • Jieyang Wang,
  • Xuan Zhao,
  • Philip Murzynowski,
  • Kathleen Wang,
  • Rose Hanna,
  • Alice Huang,
  • Diana Corti,
  • Dang Nguyen,
  • Ti Lam,
  • Seda Kilinc,
  • Patrick Arensdorf,
  • Kimberly H. Chau,
  • Anna Hartwig,
  • Lisa Fish,
  • Helen Li,
  • Babak Behsaz,
  • Olivier Elemento,
  • James Zou,
  • Fereydoun Hormozdiari,
  • Babak Alipanahi,
  • Hani Goodarzi

DOI
https://doi.org/10.1038/s41467-024-53851-9
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Liquid biopsies have the potential to revolutionize cancer care through non-invasive early detection of tumors. Developing a robust liquid biopsy test requires collecting high-dimensional data from a large number of blood samples across heterogeneous groups of patients. We propose that the generative capability of variational auto-encoders enables learning a robust and generalizable signature of blood-based biomarkers. In this study, we analyze orphan non-coding RNAs (oncRNAs) from serum samples of 1050 individuals diagnosed with non-small cell lung cancer (NSCLC) at various stages, as well as sex-, age-, and BMI-matched controls. We demonstrate that our multi-task generative AI model, Orion, surpasses commonly used methods in both overall performance and generalizability to held-out datasets. Orion achieves an overall sensitivity of 94% (95% CI: 87%–98%) at 87% (95% CI: 81%–93%) specificity for cancer detection across all stages, outperforming the sensitivity of other methods on held-out validation datasets by more than ~ 30%.