IEEE Access (Jan 2024)
Overcoming Overconfidence for Active Learning
Abstract
Recent advances in artificial intelligence undeniably depend on vast amounts of high-quality data. However, a persistent global challenge is the restricted budgets allocated for data labeling. To address this, active learning emerges as a prominent and efficient strategy. It involves iterative selections of valuable data for labeling through a model and updating the model based on these selections. Nonetheless, the limited data available in each iteration renders the model susceptible to bias, resulting in potentially overconfident predictions. To mitigate this issue, we propose the Overcoming Overconfidence for Active Learning (OO4AL) framework. This framework comprises two parts: Cross-Mix-and-Mix, an augmentation strategy aimed at broadening the training distribution to calibrate the model, and Ranked Margin Sampling, a selection strategy that prevents the selection of overconfidence-inducing data by evaluating predictions. Through comprehensive experiments and analyses, we demonstrate that our framework facilitates efficient data selection by reducing overconfidence, though it can be readily implemented.
Keywords