A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision

Zhihan Cheng; Yue Wu; Yule Li; Lingfeng Cai; Baha Ihnaini

doi:10.3390/s25134166

Sensors (Jul 2025)

A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision

Zhihan Cheng,
Yue Wu,
Yule Li,
Lingfeng Cai,
Baha Ihnaini

Affiliations

Zhihan Cheng: Department of Mathematics, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
Yue Wu: Department of Mathematics, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
Yule Li: Department of Computer Sciences, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
Lingfeng Cai: Department of Mathematics, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China
Baha Ihnaini: Department of Computer Sciences, College of Science, Mathematics and Technology, Wenzhou-Kean University, Wenzhou 325060, China

DOI: https://doi.org/10.3390/s25134166
Journal volume & issue: Vol. 25, no. 13
p. 4166

Abstract

Read online

Explainable Artificial Intelligence (XAI) is increasingly important in computer vision, aiming to connect complex model outputs with human understanding. This review provides a focused comparative analysis of representative XAI methods in four main categories, attribution-based, activation-based, perturbation-based, and transformer-based approaches, selected from a broader literature landscape. Attribution-based methods like Grad-CAM highlight key input regions using gradients and feature activation. Activation-based methods analyze the responses of internal neurons or feature maps to identify which parts of the input activate specific layers or units, helping to reveal hierarchical feature representations. Perturbation-based techniques, such as RISE, assess feature importance through input modifications without accessing internal model details. Transformer-based methods, which use self-attention, offer global interpretability by tracing information flow across layers. We evaluate these methods using metrics such as faithfulness, localization accuracy, efficiency, and overlap with medical annotations. We also propose a hierarchical taxonomy to classify these methods, reflecting the diversity of XAI techniques. Results show that RISE has the highest faithfulness but is computationally expensive, limiting its use in real-time scenarios. Transformer-based methods perform well in medical imaging, with high IoU scores, though interpreting attention maps requires care. These findings emphasize the need for context-aware evaluation and hybrid XAI methods balancing interpretability and efficiency. The review ends by discussing ethical and practical challenges, stressing the need for standard benchmarks and domain-specific tuning.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords