Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs

Behrendt Finn; Bhattacharya Debayan; Krüger Julia; Opfer Roland; Schlaefer Alexander

doi:10.1515/cdbme-2022-0009

Current Directions in Biomedical Engineering (Jul 2022)

Data-Efficient Vision Transformers for Multi-Label Disease Classification on Chest Radiographs

Behrendt Finn,
Bhattacharya Debayan,
Krüger Julia,
Opfer Roland,
Schlaefer Alexander

Affiliations

Behrendt Finn: Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology,Hamburg, Germany
Bhattacharya Debayan: Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology,Hamburg, Germany
Krüger Julia: Jung Diagnostics GmbH,Hamburg, Germany
Opfer Roland: Jung Diagnostics GmbH,Hamburg, Germany
Schlaefer Alexander: Institute of Medical Technology and Intelligent Systems, Hamburg University of Technology,Hamburg, Germany

DOI: https://doi.org/10.1515/cdbme-2022-0009
Journal volume & issue: Vol. 8, no. 1
pp. 34 – 37

Abstract

Read online

Radiographs are a versatile diagnostic tool for the detection and assessment of pathologies, for treatment planning or for navigation and localization purposes in clinical interventions. However, their interpretation and assessment by radiologists can be tedious and error-prone. Thus, a wide variety of deep learning methods have been proposed to support radiologists interpreting radiographs. Mostly, these approaches rely on convolutional neural networks (CNN) to extract features from images. Especially for the multi-label classification of pathologies on chest radiographs (Chest X-Rays, CXR), CNNs have proven to be well suited. On the Contrary, Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images and interpretable local saliency maps which could add value to clinical interventions. ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present. While this leads to increased capacity, ViTs typically require an excessive amount of training data which represents a hurdle in the medical domain as high costs are associated with collecting large medical data sets. In this work, we systematically compare the classification performance of ViTs and CNNs for different data set sizes and evaluate more data-efficient ViT variants (DeiT). Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.

Published in Current Directions in Biomedical Engineering

ISSN: 2364-5504 (Online)
Publisher: De Gruyter
Country of publisher: Germany
LCC subjects: Medicine
Website: https://www.degruyter.com/view/j/cdbme

About the journal

Abstract

Keywords