IEEE Access (Jan 2022)

A Hybrid VDV Model for Automatic Diagnosis of Pneumothorax Using Class-Imbalanced Chest X-Rays Dataset

  • Tahira Iqbal,
  • Arslan Shaukat,
  • Muhammad Usman Akram,
  • Abdul Wahab Muzaffar,
  • Zartasha Mustansar,
  • Yung-Cheol Byun

DOI
https://doi.org/10.1109/ACCESS.2022.3157316
Journal volume & issue
Vol. 10
pp. 27670 – 27683

Abstract

Read online

Pneumothorax, a life-threatening disease, needs to be diagnosed immediately and efficiently. The prognosis, in this case, is not only time-consuming but also prone to human errors. So, an automatic way of accurate diagnosis using chest X-rays is the utmost requirement. To date, most of the available medical image datasets have a class-imbalance (CI) issue. The main theme of this study is to solve this problem along with proposing an automated way of detecting pneumothorax. To find the optimal approach for CI problem, we first compare the existing approaches and find that under-bagging method (referred as data-level-ensemble formed by creating subsets of majority class and then combining each subset with all samples of minority class) outperforms other existing approaches. After selection of best approach for CI problem, we propose a novel framework, named as VDV model, for pneumothorax detection from highly imbalance dataset. The proposed VDV model is a complex model-level ensemble of data-level-ensembles and uses three convolutional neural networks (CNN) including VGG16, VGG-19, and DenseNet-121 as fixed feature extractors. In each data-level-ensemble, features extracted from one of the pre-defined CNN architectures are fed to support vector machine (SVM) classifier, and output is calculated using the voting method. Once outputs from the three data-level-ensembles (corresponding to three different CNN architectures as feature extractor) are obtained, then, again, the voting method is used to calculate the final prediction. Our proposed framework is tested on the SIIM ACR Pneumothorax dataset and Random Sample of NIH Chest X-ray dataset (RS-NIH). For the first dataset, 85.17% Recall with 86.0% Area under the Receiver Operating Characteristic curve (AUC) is attained. For the second dataset, 90.9% Recall with 95.0% AUC is achieved with a random split of data while 85.45% recall with 77.06% AUC is obtained with a patient-wise split of data. The comparison of our results for both the datasets with related work proves the effectiveness of proposed VDV model for pneumothorax detection.

Keywords