IJAIN (International Journal of Advances in Intelligent Informatics) (Jul 2022)

Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans

  • Lyubomir Gotsev,
  • Ivan Mitkov,
  • Eugenia Kovatcheva,
  • Boyan Jekov,
  • Roumen Nikolov,
  • Elena Shoikova,
  • Milena Petkova

DOI
https://doi.org/10.26555/ijain.v8i2.817
Journal volume & issue
Vol. 8, no. 2
pp. 135 – 150

Abstract

Read online

The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.

Keywords