IEEE Access (Jan 2024)
Data Distribution-Based Curriculum Learning
Abstract
The order of training samples can have a significant impact on a model’s performance. Curriculum learning is an approach for gradually training a model by ordering samples from ‘easy’ to ‘hard’. This paper proposes the novel idea of a curriculum learning strategy called Data Distribution-based Curriculum Learning (DDCL). DDCL uses the inherent data distribution of a dataset to build a curriculum based on the order of samples. Our proposed approach is innovative as it incorporates two distinct scoring methods known as DDCL-Density and DDCL-Point to determine the order of training samples. The DDCL-Density method assigns scores based on the density of samples favoring denser regions that can make initial learning easier. Conversely, DDCL-Point utilizes the Euclidean distance from the centroid of the dataset as a reference point to score samples providing an alternative perspective on sample difficulty. We evaluate the proposed DDCL approach by conducting experiments across various classifiers using a diverse set of small to medium-sized medical datasets. Results show that DDCL improves the classification accuracy, achieving increases ranging from 2% to 10% compared to baseline methods and other state-of-the-art techniques. Moreover, analysis of the error losses for a single training epoch reveals that DDCL not only improves accuracy but also increases the convergence rate, underlining its potential for more efficient training. The findings suggest that DDCL can specifically be of benefit to medical applications where data is often limited and indicate promising directions for future research in domains that involve limited datasets.
Keywords