Data augmentation based semi-supervised method to improve COVID-19 CT classification

Xiangtao Chen; Yuting Bai; Peng Wang; Jiawei Luo

doi:10.3934/mbe.2023294

Mathematical Biosciences and Engineering (Feb 2023)

Data augmentation based semi-supervised method to improve COVID-19 CT classification

Xiangtao Chen,
Yuting Bai ,
Peng Wang ,
Jiawei Luo

Affiliations

Xiangtao Chen: 1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
Yuting Bai: 1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
Peng Wang: 2. College of Computer Science and Engineering, Hunan Institute of Technology, Hengyang 421002, China
Jiawei Luo: 1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China

DOI: https://doi.org/10.3934/mbe.2023294
Journal volume & issue: Vol. 20, no. 4
pp. 6838 – 6852

Abstract

Read online

The Coronavirus (COVID-19) outbreak of December 2019 has become a serious threat to people around the world, creating a health crisis that infected millions of lives, as well as destroying the global economy. Early detection and diagnosis are essential to prevent further transmission. The detection of COVID-19 computed tomography images is one of the important approaches to rapid diagnosis. Many different branches of deep learning methods have played an important role in this area, including transfer learning, contrastive learning, ensemble strategy, etc. However, these works require a large number of samples of expensive manual labels, so in order to save costs, scholars adopted semi-supervised learning that applies only a few labels to classify COVID-19 CT images. Nevertheless, the existing semi-supervised methods focus primarily on class imbalance and pseudo-label filtering rather than on pseudo-label generation. Accordingly, in this paper, we organized a semi-supervised classification framework based on data augmentation to classify the CT images of COVID-19. We revised the classic teacher-student framework and introduced the popular data augmentation method Mixup, which widened the distribution of high confidence to improve the accuracy of selected pseudo-labels and ultimately obtain a model with better performance. For the COVID-CT dataset, our method makes precision, F1 score, accuracy and specificity 21.04%, 12.95%, 17.13% and 38.29% higher than average values for other methods respectively, For the SARS-COV-2 dataset, these increases were 8.40%, 7.59%, 9.35% and 12.80% respectively. For the Harvard Dataverse dataset, growth was 17.64%, 18.89%, 19.81% and 20.20% respectively. The codes are available at https://github.com/YutingBai99/COVID-19-SSL.

Published in Mathematical Biosciences and Engineering

ISSN: 1551-0018 (Online)
Publisher: AIMS Press
Country of publisher: United States
LCC subjects: Technology: Chemical technology: Biotechnology; Science: Mathematics
Website: https://www.aimspress.com/journal/MBE

About the journal

Abstract

Keywords