Scientific Reports (Jan 2023)
Combining convolutional neural networks and self-attention for fundus diseases identification
Abstract
Abstract Early detection of lesions is of great significance for treating fundus diseases. Fundus photography is an effective and convenient screening technique by which common fundus diseases can be detected. In this study, we use color fundus images to distinguish among multiple fundus diseases. Existing research on fundus disease classification has achieved some success through deep learning techniques, but there is still much room for improvement in model evaluation metrics using only deep convolutional neural network (CNN) architectures with limited global modeling ability; the simultaneous diagnosis of multiple fundus diseases still faces great challenges. Therefore, given that the self-attention (SA) model with a global receptive field may have robust global-level feature modeling ability, we propose a multistage fundus image classification model MBSaNet which combines CNN and SA mechanism. The convolution block extracts the local information of the fundus image, and the SA module further captures the complex relationships between different spatial positions, thereby directly detecting one or more fundus diseases in retinal fundus image. In the initial stage of feature extraction, we propose a multiscale feature fusion stem, which uses convolutional kernels of different scales to extract low-level features of the input image and fuse them to improve recognition accuracy. The training and testing were performed based on the ODIR-5k dataset. The experimental results show that MBSaNet achieves state-of-the-art performance with fewer parameters. The wide range of diseases and different fundus image collection conditions confirmed the applicability of MBSaNet.