Ophthalmology Science (Sep 2021)

Improving Interpretability in Machine Diagnosis

  • Xiaoshuang Shi, PhD,
  • Tiarnan D.L. Keenan, MD,
  • Qingyu Chen, PhD,
  • Tharindu De Silva, PhD,
  • Alisa T. Thavikulwat, MD,
  • Geoffrey Broadhead, MD,
  • Sanjeeb Bhandari, MD,
  • Catherine Cukras, MD,
  • Emily Y. Chew, MD,
  • Zhiyong Lu, PhD

Journal volume & issue
Vol. 1, no. 3
p. 100038

Abstract

Read online

Purpose: Manually identifying geographic atrophy (GA) presence and location on OCT volume scans can be challenging and time consuming. This study developed a deep learning model simultaneously (1) to perform automated detection of GA presence or absence from OCT volume scans and (2) to provide interpretability by demonstrating which regions of which B-scans show GA. Design: Med-XAI-Net, an interpretable deep learning model was developed to detect GA presence or absence from OCT volume scans using only volume scan labels, as well as to interpret the most relevant B-scans and B-scan regions. Participants: One thousand two hundred eighty-four OCT volume scans (each containing 100 B-scans) from 311 participants, including 321 volumes with GA and 963 volumes without GA. Methods: Med-XAI-Net simulates the human diagnostic process by using a region-attention module to locate the most relevant region in each B-scan, followed by an image-attention module to select the most relevant B-scans for classifying GA presence or absence in each OCT volume scan. Med-XAI-Net was trained and tested (80% and 20% participants, respectively) using gold standard volume scan labels from human expert graders. Main Outcome Measures: Accuracy, area under the receiver operating characteristic (ROC) curve, F1 score, sensitivity, and specificity. Results: In the detection of GA presence or absence, Med-XAI-Net obtained superior performance (91.5%, 93.5%, 82.3%, 82.8%, and 94.6% on accuracy, area under the ROC curve, F1 score, sensitivity, and specificity, respectively) to that of 2 other state-of-the-art deep learning methods. The performance of ophthalmologists grading only the 5 B-scans selected by Med-XAI-Net as most relevant (95.7%, 95.4%, 91.2%, and 100%, respectively) was almost identical to that of ophthalmologists grading all volume scans (96.0%, 95.7%, 91.8%, and 100%, respectively). Even grading only 1 region in 1 B-scan, the ophthalmologists demonstrated moderately high performance (89.0%, 87.4%, 77.6%, and 100%, respectively). Conclusions: Despite using ground truth labels during training at the volume scan level only, Med-XAI-Net was effective in locating GA in B-scans and selecting relevant B-scans within each volume scan for GA diagnosis. These results illustrate the strengths of Med-XAI-Net in interpreting which regions and B-scans contribute to GA detection in the volume scan.

Keywords