IEEE Access (Jan 2024)
Explainable Multi-Modal Deep Learning With Cross-Modal Attention for Diagnosis of Dyssynergic Defecation Using Abdominal X-Ray Images and Symptom Questionnaire
Abstract
Dyssynergic defecation (DD) is a type of functional constipation that requires a specialized test for diagnosis. However, these tests are only accessible in tertiary care because they require devices that are not available elsewhere. In this work, we present explainable multi-modal deep learning models that can pre-screen patients with DD, using affordable data accessible in small hospitals i.e. abdominal X-ray images and symptom questionnaires; the output classifies whether DD is present or not. To enhance the model’s performance, we apply cross-modal attention to help the model find meaningful interactions between the two modalities. A convolution block attention module (CBAM) is added to obtain more important semantic and spatial features from the images. Masking augmentation is implemented to ignore irrelevant backgrounds in images. Both explainable AI techniques like gradient-weighted class activation mapping (Grad-CAM) and deep shapley additive explanations (DeepSHAP) are also used to explain the important parts of images and the symptom data for each patient. In our experiments, all models are run on 3 patient-based bootstraps. Our model is compared with single-modal models and human experts. Results demonstrate that our multi-modal model outperforms the single-modal model and achieves the highest in terms of sensitivity, specificity, F1, and accuracy (87.37%, 77.01%, 82.17%, and 82.27%), respectively. In addition, our model outperforms human experts, which shows its ability to assist human experts in diagnosing DD. This model is a novel clinical tool that combines symptom and image data for a more accurate diagnosis of DD.
Keywords