Deep Learning Architectures for Diagnosis of Diabetic Retinopathy

Alberto Solano; Kevin N. Dietrich; Marcelino Martínez-Sober; Regino Barranquero-Cardeñosa; Jorge Vila-Tomás; Pablo Hernández-Cámara

doi:10.3390/app13074445

Applied Sciences (Mar 2023)

Deep Learning Architectures for Diagnosis of Diabetic Retinopathy

Alberto Solano,
Kevin N. Dietrich,
Marcelino Martínez-Sober,
Regino Barranquero-Cardeñosa,
Jorge Vila-Tomás,
Pablo Hernández-Cámara

Affiliations

Alberto Solano: Intelligent Data Analysis Laboratory, ETSE (Engineering School), Universitat de València, 46100 Burjassot, Spain
Kevin N. Dietrich: Intelligent Data Analysis Laboratory, ETSE (Engineering School), Universitat de València, 46100 Burjassot, Spain
Marcelino Martínez-Sober: Intelligent Data Analysis Laboratory, ETSE (Engineering School), Universitat de València, 46100 Burjassot, Spain
Regino Barranquero-Cardeñosa: Intelligent Data Analysis Laboratory, ETSE (Engineering School), Universitat de València, 46100 Burjassot, Spain
Jorge Vila-Tomás: Image Processing Lab., Universitat de València, 46980 Paterna, Spain
Pablo Hernández-Cámara: Image Processing Lab., Universitat de València, 46980 Paterna, Spain

DOI: https://doi.org/10.3390/app13074445
Journal volume & issue: Vol. 13, no. 7
p. 4445

Abstract

Read online

For many years, convolutional neural networks dominated the field of computer vision, not least in the medical field, where problems such as image segmentation were addressed by such networks as the U-Net. The arrival of self-attention-based networks to the field of computer vision through ViTs seems to have changed the trend of using standard convolutions. Throughout this work, we apply different architectures such as U-Net, ViTs and ConvMixer, to compare their performance on a medical semantic segmentation problem. All the models have been trained from scratch on the DRIVE dataset and evaluated on their private counterparts to assess which of the models performed better in the segmentation problem. Our major contribution is showing that the best-performing model (ConvMixer) is the one that shares the approach from the ViT (processing images as patches) while maintaining the foundational blocks (convolutions) from the U-Net. This mixture does not only produce better results (DICE=0.83) than both ViTs (0.80/0.077 for UNETR/SWIN-Unet) and the U-Net (0.82) on their own but reduces considerably the number of parameters (2.97M against 104M/27M and 31M, respectively), showing that there is no need to systematically use large models for solving image problems where smaller architectures with the optimal pieces can get better results.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords