Current Directions in Biomedical Engineering (Sep 2022)

Saliency-assisted multi-label classification for explainable deep learning applications in endoscopic ENT navigation

  • Bieck Richard,
  • Heuermann Katharina,
  • Sorge Martin,
  • Neumuth Thomas,
  • Pirlich Markus

DOI
https://doi.org/10.1515/cdbme-2022-1152
Journal volume & issue
Vol. 8, no. 2
pp. 596 – 599

Abstract

Read online

Introduction: In endoscopic procedures of the nasal sinuses, a critical issue for classification tasks is the ambiguous anatomical representations due to their complex composition. We investigated the potential of multi-label image-based classifications of sinus landmark combinations together with explainability methods for machine learning in an assistance function at the application level. By combining image classification and pixel attribution in a navigation function, we provide the surgeon with label predictions and additional localization cues of important pixels relevant to the model output with regard to the input image. Methods: We used 3500 label annotated video sequences from 30 recorded sinus surgeries to fine-tune a pretrained ResNet50 as the feature extractor and a classification head using binary cross-entropy on one-hot encoded target vectors of landmark classes with a RAdam optimizer over 28-32 epochs. Image augmentation and a focal loss function were added to counter over-fitting. An explainability function used the trained model to produce pixel attribution maps for predicted classes regarding individual input images. These gradient maps were summed over all classes, and pixel values above 0.20 were clustered using weighted k-means based on the gradient value of each pixel coordinate. The resulting cluster centroids were then overlapped into the endoscopic image with the predicted landmark classes. Three surgeons investigated three different overlay scenarios in a validation study. Results: The top-1 predictions reached a mean f1-score of 0.47, with the highest values of 0.71 and the lowest with 0.28. Despite over-fitting mitigation, prediction results largely depended on over- and underrepresented classes. Conclusion: The provided explainability function at the application level showed the strong potential of delivering visual cues from prediction results to surgeons at runtime to support human-machine interaction.

Keywords