Insights into Imaging (Nov 2023)
Weakly supervised segmentation models as explainable radiological classifiers for lung tumour detection on CT images
Abstract
Abstract Purpose Interpretability is essential for reliable convolutional neural network (CNN) image classifiers in radiological applications. We describe a weakly supervised segmentation model that learns to delineate the target object, trained with only image-level labels (“image contains object” or “image does not contain object”), presenting a different approach towards explainable object detectors for radiological imaging tasks. Methods A weakly supervised Unet architecture (WSUnet) was trained to learn lung tumour segmentation from image-level labelled data. WSUnet generates voxel probability maps with a Unet and then constructs an image-level prediction by global max-pooling, thereby facilitating image-level training. WSUnet’s voxel-level predictions were compared to traditional model interpretation techniques (class activation mapping, integrated gradients and occlusion sensitivity) in CT data from three institutions (training/validation: n = 412; testing: n = 142). Methods were compared using voxel-level discrimination metrics and clinical value was assessed with a clinician preference survey on data from external institutions. Results Despite the absence of voxel-level labels in training, WSUnet’s voxel-level predictions localised tumours precisely in both validation (precision: 0.77, 95% CI: [0.76–0.80]; dice: 0.43, 95% CI: [0.39–0.46]), and external testing (precision: 0.78, 95% CI: [0.76–0.81]; dice: 0.33, 95% CI: [0.32–0.35]). WSUnet’s voxel-level discrimination outperformed the best comparator in validation (area under precision recall curve (AUPR): 0.55, 95% CI: [0.49–0.56] vs. 0.23, 95% CI: [0.21–0.25]) and testing (AUPR: 0.40, 95% CI: [0.38–0.41] vs. 0.36, 95% CI: [0.34–0.37]). Clinicians preferred WSUnet predictions in most instances (clinician preference rate: 0.72 95% CI: [0.68–0.77]). Conclusion Weakly supervised segmentation is a viable approach by which explainable object detection models may be developed for medical imaging. Critical relevance statement WSUnet learns to segment images at voxel level, training only with image-level labels. A Unet backbone first generates a voxel-level probability map and then extracts the maximum voxel prediction as the image-level prediction. Thus, training uses only image-level annotations, reducing human workload. WSUnet’s voxel-level predictions provide a causally verifiable explanation for its image-level prediction, improving interpretability. Key points • Explainability and interpretability are essential for reliable medical image classifiers. • This study applies weakly supervised segmentation to generate explainable image classifiers. • The weakly supervised Unet inherently explains its image-level predictions at voxel level. Graphical Abstract
Keywords