Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy

Jocelyn Hui Lin Goh, BEng; Elroy Ang, BEng; Sahana Srinivasan, BEng; Xiaofeng Lei, MSc; Johnathan Loh, MEng; Ten Cheer Quek, BEng; Cancan Xue, PhD; Xinxing Xu, PhD; Yong Liu, PhD; Ching-Yu Cheng, PhD; Jagath C. Rajapakse, PhD; Yih-Chung Tham, PhD

Ophthalmology Science (Nov 2024)

Comparative Analysis of Vision Transformers and Conventional Convolutional Neural Networks in Detecting Referable Diabetic Retinopathy

Jocelyn Hui Lin Goh, BEng,
Elroy Ang, BEng,
Sahana Srinivasan, BEng,
Xiaofeng Lei, MSc,
Johnathan Loh, MEng,
Ten Cheer Quek, BEng,
Cancan Xue, PhD,
Xinxing Xu, PhD,
Yong Liu, PhD,
Ching-Yu Cheng, PhD,
Jagath C. Rajapakse, PhD,
Yih-Chung Tham, PhD

Affiliations

Jocelyn Hui Lin Goh, BEng: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore
Elroy Ang, BEng: School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
Sahana Srinivasan, BEng: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore
Xiaofeng Lei, MSc: Institute of High-Performance Computing, A∗STAR, Singapore, Singapore
Johnathan Loh, MEng: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore
Ten Cheer Quek, BEng: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore
Cancan Xue, PhD: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore
Xinxing Xu, PhD: Institute of High-Performance Computing, A∗STAR, Singapore, Singapore
Yong Liu, PhD: Institute of High-Performance Computing, A∗STAR, Singapore, Singapore
Ching-Yu Cheng, PhD: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore; Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School Singapore, Singapore, Singapore
Jagath C. Rajapakse, PhD: School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore; Jagath C. Rajapakse, PhD, School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore 639798, Singapore.
Yih-Chung Tham, PhD: Singapore Eye Research Institute, Singapore National Eye Center, Singapore, Singapore; Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore; Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore and National University Health System, Singapore, Singapore; Ophthalmology and Visual Sciences Academic Clinical Program (Eye ACP), Duke-NUS Medical School Singapore, Singapore, Singapore; Correspondence: Yih-Chung Tham, Yong Loo Lin School of Medicine, National University of Singapore, Level 13, MD1 Tahir Foundation Building, 12 Science Drive 2, Singapore 117549.

Journal volume & issue: Vol. 4, no. 6
p. 100552

Abstract

Read online

Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using retinal photographs, we evaluated the comparative performances of ViTs and CNNs on detection of referable DR. Design: Retrospective study. Participants: A total of 48 269 retinal images from the open-source Kaggle DR detection dataset, the Messidor-1 dataset and the Singapore Epidemiology of Eye Diseases (SEED) study were included. Methods: Using 41 614 retinal photographs from the Kaggle dataset, we developed 5 CNN (Visual Geometry Group 19, ResNet50, InceptionV3, DenseNet201, and EfficientNetV2S) and 4 ViTs models (VAN_small, CrossViT_small, ViT_small, and Hierarchical Vision transformer using Shifted Windows [SWIN]_tiny) for the detection of referable DR. We defined the presence of referable DR as eyes with moderate or worse DR. The comparative performance of all 9 models was evaluated in the Kaggle internal test dataset (with 1045 study eyes), and in 2 external test sets, the SEED study (5455 study eyes) and the Messidor-1 (1200 study eyes). Main Outcome Measures: Area under operating characteristics curve (AUC), specificity, and sensitivity. Results: Among all models, the SWIN transformer displayed the highest AUC of 95.7% on the internal test set, significantly outperforming the CNN models (all P < 0.001). The same observation was confirmed in the external test sets, with the SWIN transformer achieving AUC of 97.3% in SEED and 96.3% in Messidor-1. When specificity level was fixed at 80% for the internal test, the SWIN transformer achieved the highest sensitivity of 94.4%, significantly better than all the CNN models (sensitivity levels ranging between 76.3% and 83.8%; all P < 0.001). This trend was also consistently observed in both external test sets. Conclusions: Our findings demonstrate that ViTs provide superior performance over CNNs in detecting referable DR from retinal photographs. These results point to the potential of utilizing ViT models to improve and optimize retinal photo-based deep learning for referable DR detection. Financial Disclosure(s): Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Published in Ophthalmology Science

ISSN: 2666-9145 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Ophthalmology
Website: https://www.journals.elsevier.com/ophthalmology-science/

About the journal

Abstract

Keywords