IEEE Access (Jan 2024)
HRS-Net: A Hybrid Multi-Scale Network Model Based on Convolution and Transformers for Multi-Class Retinal Disease Classification
Abstract
Optical coherence tomography (OCT) is an important basis for retinal diagnosis. Traditional OCT image analysis methods not only require a lot of manual operation and time, but also have a certain risk of error. Machine Learning (ML) and Deep Learning (DL) have made significant achievements in the medical field. The Convolutional Neural Networks (CNN) model performs well in extracting local features, but is less effective in extracting global features. The transformer-based structure has an advantage in extracting global features and can make up for this deficiency. This paper proposes a hybrid multi-scale network model based on CNN and Swin transformer, called HRS-Net. The model splits into two branches after the convolutional layers of ResNet50. One branch combines attention modules and residual blocks to extract local features, while the other branch primarily uses Swin transformer blocks to extract global features. Finally, the two branches are fused for the multi-classification task of retinal diseases. Experimental results show that on two public datasets, the accuracy of three-classification and four-classification reached 98.76% and 97.16% respectively, which is better than the previous classic model algorithm.
Keywords