IEEE Access (Jan 2021)
Active Learning Strategy for COVID-19 Annotated Dataset
Abstract
The efficient diagnosis of COVID-19 plays a key role in preventing its spread. Recently, many artificial intelligence techniques, such as the deep neural network approach, have been implemented to help efficient diagnosis of COVID-19. However, the accurate performance of deep learning depends on the tuning of many hyperparameters and a large amount of labeled data. This COVID-19 data bottleneck also leads to insufficient human resources for data labeling, which presents a challenging obstacle. In this paper, a novel discriminative batch-mode active learning (DS3) is proposed to allow faster and more effective COVID-19 data annotation. The framework specifically designed to suit the imbalanced data phenomenon that is characteristic of COVID-19 data. Extensive experiments over four public real-world COVID-19 datasets from several countries such as Brazil, China, Israel and Mexico show that our active learning framework significantly outmatches other state-of-the-art models. Our proposed framework achieves an average G-Mean of 10% improvement for the four datasets. Finally, the results of significance testing verify the effectiveness of DS3 and its superiority over baseline active learning algorithms.
Keywords