PLoS ONE (Jan 2020)

Analyzing inter-reader variability affecting deep ensemble learning for COVID-19 detection in chest radiographs.

  • Sivaramakrishnan Rajaraman,
  • Sudhir Sornapudi,
  • Philip O Alderson,
  • Les R Folio,
  • Sameer K Antani

DOI
https://doi.org/10.1371/journal.pone.0242301
Journal volume & issue
Vol. 15, no. 11
p. e0242301

Abstract

Read online

Data-driven deep learning (DL) methods using convolutional neural networks (CNNs) demonstrate promising performance in natural image computer vision tasks. However, their use in medical computer vision tasks faces several limitations, viz., (i) adapting to visual characteristics that are unlike natural images; (ii) modeling random noise during training due to stochastic optimization and backpropagation-based learning strategy; (iii) challenges in explaining DL black-box behavior to support clinical decision-making; and (iv) inter-reader variability in the ground truth (GT) annotations affecting learning and evaluation. This study proposes a systematic approach to address these limitations through application to the pandemic-caused need for Coronavirus disease 2019 (COVID-19) detection using chest X-rays (CXRs). Specifically, our contribution highlights significant benefits obtained through (i) pretraining specific to CXRs in transferring and fine-tuning the learned knowledge toward improving COVID-19 detection performance; (ii) using ensembles of the fine-tuned models to further improve performance over individual constituent models; (iii) performing statistical analyses at various learning stages for validating results; (iv) interpreting learned individual and ensemble model behavior through class-selective relevance mapping (CRM)-based region of interest (ROI) localization; and, (v) analyzing inter-reader variability and ensemble localization performance using Simultaneous Truth and Performance Level Estimation (STAPLE) methods. We find that ensemble approaches markedly improved classification and localization performance, and that inter-reader variability and performance level assessment helps guide algorithm design and parameter optimization. To the best of our knowledge, this is the first study to construct ensembles, perform ensemble-based disease ROI localization, and analyze inter-reader variability and algorithm performance for COVID-19 detection in CXRs.