Deep learning informed multimodal fusion of radiology and pathology to predict outcomes in HPV-associated oropharyngeal squamous cell carcinomaResearch in context
Bolin Song,
Amaury Leroy,
Kailin Yang,
Tanmoy Dam,
Xiangxue Wang,
Himanshu Maurya,
Tilak Pathak,
Jonathan Lee,
Sarah Stock,
Xiao T. Li,
Pingfu Fu,
Cheng Lu,
Paula Toro,
Deborah J. Chute,
Shlomo Koyfman,
Nabil F. Saba,
Mihir R. Patel,
Anant Madabhushi
Affiliations
Bolin Song
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Amaury Leroy
Therapanacea, Paris, France
Kailin Yang
Department of Radiation Oncology, Holden Comprehensive Cancer Center, Iowa Neuroscience Institute, University of Iowa, Iowa City, IA, USA
Tanmoy Dam
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Xiangxue Wang
Institute of Artificial Intelligence in Medicine, School of Artificial Intelligence in Medicine, Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
Himanshu Maurya
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Tilak Pathak
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA
Jonathan Lee
Diagnostics Institute, Cleveland Clinic, Cleveland, OH, USA
Sarah Stock
Diagnostics Institute, Cleveland Clinic, Cleveland, OH, USA
Xiao T. Li
Department of Radiology and Imaging Sciences, Emory University Hospital, Atlanta, GA, USA
Pingfu Fu
Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
Cheng Lu
Department of Radiology, Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Guangzhou, China
Paula Toro
Department of Pathology, Cleveland Clinic, Cleveland, OH, USA
Deborah J. Chute
Department of Pathology, Cleveland Clinic, Cleveland, OH, USA
Shlomo Koyfman
Department of Radiation Oncology, Taussig Cancer Center, Cleveland Clinic, Cleveland, OH, USA
Nabil F. Saba
Department of Hematology and Medical Oncology, Winship Cancer Institute, Atlanta, GA, USA
Mihir R. Patel
Department of Otolaryngology, Winship Cancer Institute, Atlanta, GA, USA
Anant Madabhushi
Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA; Atlanta Veterans Administration Medical Center, Atlanta, GA, USA; Corresponding author. Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA, USA.
Summary: Background: We aim to predict outcomes of human papillomavirus (HPV)-associated oropharyngeal squamous cell carcinoma (OPSCC), a subtype of head and neck cancer characterized with improved clinical outcome and better response to therapy. Pathology and radiology focused AI-based prognostic models have been independently developed for OPSCC, but their integration incorporating both primary tumour (PT) and metastatic cervical lymph node (LN) remains unexamined. Methods: We investigate the prognostic value of an AI approach termed the swintransformer-based multimodal and multi-region data fusion framework (SMuRF). SMuRF integrates features from CT corresponding to the PT and LN, as well as whole slide pathology images from the PT as a predictor of survival and tumour grade in HPV-associated OPSCC. SMuRF employs cross-modality and cross-region window based multi-head self-attention mechanisms to capture interactions between features across tumour habitats and image scales. Findings: Developed and tested on a cohort of 277 patients with OPSCC with matched radiology and pathology images, SMuRF demonstrated strong performance (C-index = 0.81 for DFS prediction and AUC = 0.75 for tumour grade classification) and emerged as an independent prognostic biomarker for DFS (hazard ratio [HR] = 17, 95% confidence interval [CI], 4.9–58, p < 0.0001) and tumour grade (odds ratio [OR] = 3.7, 95% CI, 1.4–10.5, p = 0.01) controlling for other clinical variables (i.e., T-, N-stage, age, smoking, sex and treatment modalities). Importantly, SMuRF outperformed unimodal models derived from radiology or pathology alone. Interpretation: Our findings underscore the potential of multimodal deep learning in accurately stratifying OPSCC risk, informing tailored treatment strategies and potentially refining existing treatment algorithms. Funding: The National Institutes of Health, the U.S. Department of Veterans Affairs and National Institute of Biomedical Imaging and Bioengineering.