Revista Română de Informatică și Automatică (Jun 2024)
Harnessing the power of vision transformers for enhanced OCT image classification
Abstract
The rising prevalence of eye disorders has raised concerns, emphasizing the need to accelerate the detection of retinal diseases. Early and accurate classification of these conditions is crucial for timely diagnosis and effective treatment in order to address critical situations. The recent advancements in retinal imaging have enhanced the diagnosis and management of Choroidal Neovascularization (CNV), Diabetic Macular Edema (DME) or Drusen and the deep learning-based applications on Optical Coherence Tomography (OCT) images have further revolutionized the field by enabling automated, precise, and efficient disease classification, paving the way for earlier interventions and improved patient outcomes. This study investigates the use of Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) for automated retinal disease classification. Three models were implemented: ViT, DeepViT, and a hybrid model combining ResNet50 with ViT, trained and evaluated on a publicly available OCT dataset. The hybrid model achieved the highest accuracy of 99.97%, thanks to its ability to capture both local and global features. This study underscores the potential of ViTs in medical image analysis and their integration with CNNs to develop accurate, robust, and scalable diagnostic tools, showing great promise for clinical applications.
Keywords