Informatics in Medicine Unlocked (Jan 2024)
Deciphering the impact of diversity in CNN-based ensembles on overcoming data imbalance and scarcity in medical datasets: A case study on diabetic retinopathy
Abstract
Early detection of diabetic retinopathy (DR) is critical in preventing vision loss. However, building accurate Artificial intelligence (AI) models for multiple classes, including early-stage (Class-1) detection, is challenging due to limited and imbalanced medical datasets. The availability of such datasets is restricted due to ethical and privacy concerns. Traditional ensemble models also struggle with raw medical images, further complicating the issue as they require structured data. This study presents a novel deep learning-based ensemble model (EM) designed for multiple and specifically for precise early stage (Class 1) DR classification. The EM uses eight diverse Convolutional Neural Networks (CNNs) with carefully crafted strategies to enhance diversity. Data augmentation and generation techniques address imbalanced data through data diversity, while parameter and architectural diver-sity within CNNs-based EM maximize predictive performance. Evaluation on the publicly available Kaggle APTOS DR dataset demonstrates significant improvement over individual models and existing approaches. The proposed EM achieves multi-class accuracy (93.00 %), precision (93.00 %), sensitivity (98.00 %), and specificity (99.00 %). This research highlights the effectiveness of diversified CNNs ensembles in overcoming challenges posed by imbalanced and scarce data for multiple-class DR classification. This approach paves the way for developing robust and accurate AI-powered diagnostic tools for improved diabetic retinopathy screening.