Performance Evaluation of Deep Learning Models for Image Classification Over Small Datasets: Diabetic Foot Case Study

Abian Hernandez-Guedes; Idafen Santana-Perez; Natalia Arteaga-Marrero; Himar Fabelo; Gustavo M. Callico; Juan Ruiz-Alzola

doi:10.1109/ACCESS.2022.3225107

IEEE Access (Jan 2022)

Performance Evaluation of Deep Learning Models for Image Classification Over Small Datasets: Diabetic Foot Case Study

Abian Hernandez-Guedes,
Idafen Santana-Perez,
Natalia Arteaga-Marrero,
Himar Fabelo,
Gustavo M. Callico,
Juan Ruiz-Alzola

Affiliations

Abian Hernandez-Guedes: ORCiD; Research Institute in Biomedical and Health Sciences (IUIBS), University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Idafen Santana-Perez: ORCiD; Research Institute in Biomedical and Health Sciences (IUIBS), University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Natalia Arteaga-Marrero: ORCiD; IACTEC Medical Technology Group, Instituto de Astrofísica de Canarias (IAC), San Cristóbal de La Laguna, Spain
Himar Fabelo: ORCiD; Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Gustavo M. Callico: ORCiD; Research Institute for Applied Microelectronics (IUMA), University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain
Juan Ruiz-Alzola: ORCiD; Research Institute in Biomedical and Health Sciences (IUIBS), University of Las Palmas de Gran Canaria, Las Palmas de Gran Canaria, Spain

DOI: https://doi.org/10.1109/ACCESS.2022.3225107
Journal volume & issue: Vol. 10
pp. 124373 – 124386

Abstract

Read online

Data scarcity is a common and challenging issue when working with Artificial Intelligence solutions, especially those including Deep Learning (DL) models for tasks such as image classification. This is particularly relevant in healthcare scenarios, in which data collection requires a long-lasting process, involving specific control protocols. The performance of DL models is usually quantified by different classification metrics, which may provide biased results, due to the lack of sufficient data. In this paper, an innovative approach is proposed to evaluate the performance of DL models when labeled data is scarce. This approach, which aims to detect the poor performance provided by DL models, in spite of traditional assessing metrics indicating otherwise, is based on information theoretic concepts and motivated by the Information Bottleneck framework. This methodology has been evaluated by implementing several experimental configurations to classify samples from a plantar thermogram dataset, focused on early stage detection of diabetic foot ulcers, as a case study. The proposed network architectures exhibited high results in terms of classification metrics. However, as our approach shows, only two of those models are indeed consistent to generalize the data properly. In conclusion, a new methodology was introduced and tested to identify promising DL models for image classification over small datasets without relying exclusively on the widely employed classification metrics. Example code and supplementary material using a state-of-the-art DL model are available at https://github.com/mt4sd/PerformanceEvaluationScarceDataset.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords