Scientific Reports (Apr 2024)

Vision transformer to differentiate between benign and malignant slices in 18F-FDG PET/CT

  • Daiki Nishigaki,
  • Yuki Suzuki,
  • Tadashi Watabe,
  • Daisuke Katayama,
  • Hiroki Kato,
  • Tomohiro Wataya,
  • Kosuke Kita,
  • Junya Sato,
  • Noriyuki Tomiyama,
  • Shoji Kido

DOI
https://doi.org/10.1038/s41598-024-58220-6
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Fluorine-18-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET)/computed tomography (CT) is widely used for the detection, diagnosis, and clinical decision-making in oncological diseases. However, in daily medical practice, it is often difficult to make clinical decisions because of physiological FDG uptake or cancers with poor FDG uptake. False negative clinical diagnoses of malignant lesions are critical issues that require attention. In this study, Vision Transformer (ViT) was used to automatically classify 18F-FDG PET/CT slices as benign or malignant. This retrospective study included 18F-FDG PET/CT data of 207 (143 malignant and 64 benign) patients from a medical institute to train and test our models. The ViT model achieved an area under the receiver operating characteristic curve (AUC) of 0.90 [95% CI 0.89, 0.91], which was superior to the baseline Convolutional Neural Network (CNN) models (EfficientNet, 0.87 [95% CI 0.86, 0.88], P < 0.001; DenseNet, 0.87 [95% CI 0.86, 0.88], P < 0.001). Even when FDG uptake was low, ViT produced an AUC of 0.81 [95% CI 0.77, 0.85], which was higher than that of the CNN (DenseNet, 0.65 [95% CI 0.59, 0.70], P < 0.001). We demonstrated the clinical value of ViT by showing its sensitive analysis of easy-to-miss cases of oncological diseases.