Scientific Reports (Dec 2024)
Evaluating deep learning models for classifying OCT images with limited data and noisy labels
Abstract
Abstract The use of deep learning for OCT image classification could enhance the diagnosis and monitoring of retinal diseases. However, challenges like variability in retinal abnormalities, noise, and artifacts in OCT images limit its clinical use. Our study aimed to evaluate the performance of various deep learning (DL) architectures in classifying retinal pathologies versus healthy cases based on OCT images, under data scarcity and label noise. We examined five DL architectures: ResNet18, ResNet34, ResNet50, VGG16, and InceptionV3. Fine-tuning of the pre-trained models was conducted on 5526 OCT images and reduced subsets down to 21 images to evaluate performance under data scarcity. The performance of models fine-tuned on subsets with label noise levels of 10%, 15%, and 20% was evaluated. All DL architectures achieved high classification accuracy (> 90%) with training sets of 345 or more images. InceptionV3 achieved the highest classification accuracy (99%) when trained on the entire training set. However, classification accuracy decreased and variability increased as sample size decreased. Label noise significantly affected model accuracy. Compensating for labeling errors of 10%, 15%, and 20% requires approximately 4, 9, and 14 times more images in the training set to reach the performance of 345 correctly labeled images. The results showed that DL models fine-tuned on sets of 345 or more OCT images can accurately classify retinal pathologies versus healthy controls. Our findings highlight that while mislabeling errors significantly impact classification performance in OCT analysis, this can be effectively mitigated by increasing the training sample size. By addressing data scarcity and labeling errors, our research aims to improve the real-world application and accuracy of retinal disease management.
Keywords