Mayo Clinic Proceedings: Digital Health (Jun 2023)

Assessment of Performance, Interpretability, and Explainability in Artificial Intelligence–Based Health Technologies: What Healthcare Stakeholders Need to Know

  • Line Farah, PharmD,
  • Juliette M. Murris, MSc,
  • Isabelle Borget, PhD, PharmD,
  • Agathe Guilloux, PhD,
  • Nicolas M. Martelli, PhD, PharmD,
  • Sandrine I.M. Katsahian, MD, PhD

Journal volume & issue
Vol. 1, no. 2
pp. 120 – 138

Abstract

Read online

This review aimed to specify different concepts that are essential to the development of medical devices (MDs) with artificial intelligence (AI) (AI-based MDs) and shed light on how algorithm performance, interpretability, and explainability are key assets. First, a literature review was performed to determine the key criteria needed for a health technology assessment of AI-based MDs in the existing guidelines. Then, we analyzed the existing assessment methodologies of the different criteria selected after the literature review. The scoping review revealed that health technology assessment agencies have highlighted different criteria, with 3 important ones to reinforce confidence in AI-based MDs: performance, interpretability, and explainability. We give recommendations on how and when to evaluate performance on the basis of the model structure and available data. In addition, should interpretability and explainability be difficult to define mathematically, we describe existing ways to support their evaluation. We also provide a decision support flowchart to identify the anticipated regulatory requirements for the development and assessment of AI-based MDs. The importance of explainability and interpretability techniques in health technology assessment agencies is increasing to hold stakeholders more accountable for the decisions made by AI-based MDs. The identification of 3 main assessment criteria for AI-based MDs according to health technology assessment guidelines led us to propose a set of tools and methods to help understand how and why machine learning algorithms work as well as their predictions.