Improving explainable AI with patch perturbation-based evaluation pipeline: a COVID-19 X-ray image analysis case study

Jimin Sun; Wenqi Shi; Felipe O. Giuste; Yog S. Vaghani; Lingzi Tang; May D. Wang

doi:10.1038/s41598-023-46493-2

Scientific Reports (Nov 2023)

Improving explainable AI with patch perturbation-based evaluation pipeline: a COVID-19 X-ray image analysis case study

Jimin Sun,
Wenqi Shi,
Felipe O. Giuste,
Yog S. Vaghani,
Lingzi Tang,
May D. Wang

Affiliations

Jimin Sun: School of Computer Science and Engineering, Georgia Institute of Technology
Wenqi Shi: School of Electrical and Computer Engineering, Georgia Institute of Technology
Felipe O. Giuste: The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University
Yog S. Vaghani: The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University
Lingzi Tang: The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University
May D. Wang: The Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology and Emory University

DOI: https://doi.org/10.1038/s41598-023-46493-2
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Recent advances in artificial intelligence (AI) have sparked interest in developing explainable AI (XAI) methods for clinical decision support systems, especially in translational research. Although using XAI methods may enhance trust in black-box models, evaluating their effectiveness has been challenging, primarily due to the absence of human (expert) intervention, additional annotations, and automated strategies. In order to conduct a thorough assessment, we propose a patch perturbation-based approach to automatically evaluate the quality of explanations in medical imaging analysis. To eliminate the need for human efforts in conventional evaluation methods, our approach executes poisoning attacks during model retraining by generating both static and dynamic triggers. We then propose a comprehensive set of evaluation metrics during the model inference stage to facilitate the evaluation from multiple perspectives, covering a wide range of correctness, completeness, consistency, and complexity. In addition, we include an extensive case study to showcase the proposed evaluation strategy by applying widely-used XAI methods on COVID-19 X-ray imaging classification tasks, as well as a thorough review of existing XAI methods in medical imaging analysis with evaluation availability. The proposed patch perturbation-based workflow offers model developers an automated and generalizable evaluation strategy to identify potential pitfalls and optimize their proposed explainable solutions, while also aiding end-users in comparing and selecting appropriate XAI methods that meet specific clinical needs in real-world clinical research and practice.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal