Advanced grad-CAM extensions for interpretable aphasia speech keyword classification: Bridging the gap in impaired speech with XAI

Gowri Prasood Usha; John Sahaya Rani Alex

Results in Engineering (Dec 2024)

Advanced grad-CAM extensions for interpretable aphasia speech keyword classification: Bridging the gap in impaired speech with XAI

Gowri Prasood Usha,
John Sahaya Rani Alex

Affiliations

Gowri Prasood Usha: School of Electronics Engineering, Vellore Institute of Technology Chennai 600127, India
John Sahaya Rani Alex: Corresponding author: School of Electronics Engineering, Vellore Institute of Technology Chennai, India 600127.; School of Electronics Engineering, Vellore Institute of Technology Chennai 600127, India

Journal volume & issue: Vol. 24
p. 103414

Abstract

Read online

Aphasia, a language disorder caused by brain injury, presents significant speech recognition and classification challenges due to irregular speech patterns. While the standard Grad-CAM (Gradient-weighted Class Activation Mapping) technique is widely used for model interpretation, its application to impaired speech remains largely unexplored. To address this gap, we introduce a set of extension studies of enhanced Grad-CAM techniques, namely Enhanced Directional Grad-CAM (ED-GCAM), Multi-Scale Channel-wise Grad-CAM (MSCW-GCAM), Stochastic Gradient-Dropout Integrated Grad-CAM (SGD-GCAM), and Enhanced Hierarchical Filtered Grad-CAM (EH-FCAM) to improve interpretability and performance in aphasia speech keyword classification. When applied to attention-based CNN models, these techniques generate more focused, class-specific heatmaps, providing a deeper understanding of model behaviour, particularly in noisy and impaired speech. Our results demonstrate that these enhanced Grad-CAM methods outperform the standard Grad-CAM by offering more detailed and meaningful explanations, which is critical for interpreting models applied to aphasia speech. We compare our approach using qualitative and perturbation-based trustworthiness, infidelity and sufficiency scores as quantitative metrics. Among the techniques, ED-GCAM outperformed all others. The proposed methods significantly improve the accuracy and transparency of speech processing models, with potential suggestions for clinical applications.

Published in Results in Engineering

ISSN: 2590-1230 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology
Website: https://www.journals.elsevier.com/results-in-engineering

About the journal

Abstract

Keywords