Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset

Daniel Schaudt; Reinhold von Schwerin; Alexander Hafner; Pascal Riedel; Manfred Reichert; Marianne von Schwerin; Meinrad Beer; Christopher Kloth

doi:10.1038/s41598-023-45532-2

Scientific Reports (Oct 2023)

Augmentation strategies for an imbalanced learning problem on a novel COVID-19 severity dataset

Daniel Schaudt,
Reinhold von Schwerin,
Alexander Hafner,
Pascal Riedel,
Manfred Reichert,
Marianne von Schwerin,
Meinrad Beer,
Christopher Kloth

Affiliations

Daniel Schaudt: Department of Computer Science, Ulm University of Applied Science
Reinhold von Schwerin: Department of Computer Science, Ulm University of Applied Science
Alexander Hafner: Department of Computer Science, Ulm University of Applied Science
Pascal Riedel: Department of Computer Science, Ulm University of Applied Science
Manfred Reichert: Institute of Databases and Information Systems, Ulm University
Marianne von Schwerin: Department of Computer Science, Ulm University of Applied Science
Meinrad Beer: Department of Radiology, University Hospital of Ulm
Christopher Kloth: Department of Radiology, University Hospital of Ulm

DOI: https://doi.org/10.1038/s41598-023-45532-2
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Since the beginning of the COVID-19 pandemic, many different machine learning models have been developed to detect and verify COVID-19 pneumonia based on chest X-ray images. Although promising, binary models have only limited implications for medical treatment, whereas the prediction of disease severity suggests more suitable and specific treatment options. In this study, we publish severity scores for the 2358 COVID-19 positive images in the COVIDx8B dataset, creating one of the largest collections of publicly available COVID-19 severity data. Furthermore, we train and evaluate deep learning models on the newly created dataset to provide a first benchmark for the severity classification task. One of the main challenges of this dataset is the skewed class distribution, resulting in undesirable model performance for the most severe cases. We therefore propose and examine different augmentation strategies, specifically targeting majority and minority classes. Our augmentation strategies show significant improvements in precision and recall values for the rare and most severe cases. While the models might not yet fulfill medical requirements, they serve as an appropriate starting point for further research with the proposed dataset to optimize clinical resource allocation and treatment.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal