EBioMedicine (Apr 2025)

Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinomaResearch in context

  • Bolin Song,
  • Amaury Leroy,
  • Kailin Yang,
  • Tanmoy Dam,
  • Xiangxue Wang,
  • Himanshu Maurya,
  • Tilak Pathak,
  • Jonathan Lee,
  • Sarah Stock,
  • Xiao T. Li,
  • Pingfu Fu,
  • Cheng Lu,
  • Paula Toro,
  • Deborah J. Chute,
  • Shlomo Koyfman,
  • Nabil F. Saba,
  • Mihir R. Patel,
  • Anant Madabhushi

DOI
https://doi.org/10.1016/j.ebiom.2025.105663
Journal volume & issue
Vol. 114
p. 105663

Abstract

Read online

Summary: Background: We aim to predict outcomes of human papillomavirus (HPV)-associated oropharyngeal squamous cell carcinoma (OPSCC), a subtype of head and neck cancer characterized with improved clinical outcome and better response to therapy. Pathology and radiology focused AI-based prognostic models have been independently developed for OPSCC, but their integration incorporating both primary tumour (PT) and metastatic cervical lymph node (LN) remains unexamined. Methods: We investigate the prognostic value of an AI approach termed the swintransformer-based multimodal and multi-region data fusion framework (SMuRF). SMuRF integrates features from CT corresponding to the PT and LN, as well as whole slide pathology images from the PT as a predictor of survival and tumour grade in HPV-associated OPSCC. SMuRF employs cross-modality and cross-region window based multi-head self-attention mechanisms to capture interactions between features across tumour habitats and image scales. Findings: Developed and tested on a cohort of 277 patients with OPSCC with matched radiology and pathology images, SMuRF demonstrated strong performance (C-index = 0.81 for DFS prediction and AUC = 0.75 for tumour grade classification) and emerged as an independent prognostic biomarker for DFS (hazard ratio [HR] = 17, 95% confidence interval [CI], 4.9–58, p < 0.0001) and tumour grade (odds ratio [OR] = 3.7, 95% CI, 1.4–10.5, p = 0.01) controlling for other clinical variables (i.e., T-, N-stage, age, smoking, sex and treatment modalities). Importantly, SMuRF outperformed unimodal models derived from radiology or pathology alone. Interpretation: Our findings underscore the potential of multimodal deep learning in accurately stratifying OPSCC risk, informing tailored treatment strategies and potentially refining existing treatment algorithms. Funding: The National Institutes of Health, the U.S. Department of Veterans Affairs and National Institute of Biomedical Imaging and Bioengineering.

Keywords