Analyzing the Effect of Basic Data Augmentation for COVID-19 Detection through a Fractional Factorial Experimental Design

Mateo Hidalgo Davila; Maria Baldeon-Calisto; Juan Jose Murillo; Bernardo Puente-Mejia; Danny Navarrete; Daniel Riofrío; Noel Peréz; Diego S. Benítez; Ricardo Flores Moyano

doi:10.28991/ESJ-2023-SPER-01

Emerging Science Journal (Sep 2022)

Analyzing the Effect of Basic Data Augmentation for COVID-19 Detection through a Fractional Factorial Experimental Design

Mateo Hidalgo Davila,
Maria Baldeon-Calisto,
Juan Jose Murillo,
Bernardo Puente-Mejia,
Danny Navarrete,
Daniel Riofrío,
Noel Peréz,
Diego S. Benítez,
Ricardo Flores Moyano

Affiliations

Mateo Hidalgo Davila: Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Diego de Robles s/n y Vía Interoceánica, Quito 170901,
Maria Baldeon-Calisto: 1) Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Diego de Robles s/n y Vía Interoceánica, Quito 170901, Ecuador. 2) Applied Signal Processing and Machine Learning Research Group USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Quito 170901,
Juan Jose Murillo: Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Diego de Robles s/n y Vía Interoceánica, Quito 170901,
Bernardo Puente-Mejia: Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Diego de Robles s/n y Vía Interoceánica, Quito 170901,
Danny Navarrete: Departamento de Ingeniería Industrial and Instituto de Innovación en Productividad y Logística CATENA-USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Diego de Robles s/n y Vía Interoceánica, Quito 170901,
Daniel Riofrío: Applied Signal Processing and Machine Learning Research Group USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Quito 170901,
Noel Peréz: Applied Signal Processing and Machine Learning Research Group USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Quito 170901,
Diego S. Benítez: Applied Signal Processing and Machine Learning Research Group USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Quito 170901,
Ricardo Flores Moyano: Applied Signal Processing and Machine Learning Research Group USFQ, Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Quito 170901,

DOI: https://doi.org/10.28991/ESJ-2023-SPER-01
Journal volume & issue: Vol. 7, no. 0
pp. 1 – 16

Abstract

Read online

The COVID-19 pandemic has created a worldwide healthcare crisis. Convolutional Neural Networks (CNNs) have recently been used with encouraging results to help detect COVID-19 from chest X-ray images. However, to generalize well to unseen data, CNNs require large labeled datasets. Due to the lack of publicly available COVID-19 datasets, most CNNs apply various data augmentation techniques during training. However, there has not been a thorough statistical analysis of how data augmentation operations affect classification performance for COVID-19 detection. In this study, a fractional factorial experimental design is used to examine the impact of basic augmentation methods on COVID-19 detection. The latter enables identifying which particular data augmentation techniques and interactions have a statistically significant impact on the classification performance, whether positively or negatively. Using the CoroNet architecture and two publicly available COVID-19 datasets, the most common basic augmentation methods in the literature are evaluated. The results of the experiments demonstrate that the methods of zoom, range, and height shift positively impact the model's accuracy in dataset 1. The performance of dataset 2 is unaffected by any of the data augmentation operations. Additionally, a new state-of-the-art performance is achieved on both datasets by training CoroNet with the ideal data augmentation values found using the experimental design. Specifically, in dataset 1, 97% accuracy, 93% precision, and 97.7% recall were attained, while in dataset 2, 97% accuracy, 97% precision, and 97.6% recall were achieved. These results indicate that analyzing the effects of data augmentations on a particular task and dataset is essential for the best performance. Doi: 10.28991/ESJ-2023-SPER-01 Full Text: PDF

Published in Emerging Science Journal

ISSN: 2610-9182 (Online)
Publisher: Ital Publication
Country of publisher: Italy
LCC subjects: Technology: Technology (General); Social Sciences: Social sciences (General)
Website: http://ijournalse.org

About the journal

Abstract

Keywords