Scientific Reports (Aug 2021)

Lung nodule detection in chest X-rays using synthetic ground-truth data comparing CNN-based diagnosis to human performance

  • Manuel Schultheiss,
  • Philipp Schmette,
  • Jannis Bodden,
  • Juliane Aichele,
  • Christina Müller-Leisse,
  • Felix G. Gassert,
  • Florian T. Gassert,
  • Joshua F. Gawlitza,
  • Felix C. Hofmann,
  • Daniel Sasse,
  • Claudio E. von Schacky,
  • Sebastian Ziegelmayer,
  • Fabio De Marco,
  • Bernhard Renger,
  • Marcus R. Makowski,
  • Franz Pfeiffer,
  • Daniela Pfeiffer

DOI
https://doi.org/10.1038/s41598-021-94750-z
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 10

Abstract

Read online

Abstract We present a method to generate synthetic thorax radiographs with realistic nodules from CT scans, and a perfect ground truth knowledge. We evaluated the detection performance of nine radiologists and two convolutional neural networks in a reader study. Nodules were artificially inserted into the lung of a CT volume and synthetic radiographs were obtained by forward-projecting the volume. Hence, our framework allowed for a detailed evaluation of CAD systems’ and radiologists’ performance due to the availability of accurate ground-truth labels for nodules from synthetic data. Radiographs for network training (U-Net and RetinaNet) were generated from 855 CT scans of a public dataset. For the reader study, 201 radiographs were generated from 21 nodule-free CT scans with altering nodule positions, sizes and nodule counts of inserted nodules. Average true positive detections by nine radiologists were 248.8 nodules, 51.7 false positive predicted nodules and 121.2 false negative predicted nodules. The best performing CAD system achieved 268 true positives, 66 false positives and 102 false negatives. Corresponding weighted alternative free response operating characteristic figure-of-merits (wAFROC FOM) for the radiologists range from 0.54 to 0.87 compared to a value of 0.81 (CI 0.75–0.87) for the best performing CNN. The CNN did not perform significantly better against the combined average of the 9 readers (p = 0.49). Paramediastinal nodules accounted for most false positive and false negative detections by readers, which can be explained by the presence of more tissue in this area.