QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation

Andreas Hammer Haversen; Durga Prasad Bavirisetti; Gabriel Hanssen Kiss; Frank Lindseth

doi:10.1109/ACCESS.2024.3395058

IEEE Access (Jan 2024)

QT-UNet: A Self-Supervised Self-Querying All-Transformer U-Net for 3D Segmentation

Andreas Hammer Haversen,
Durga Prasad Bavirisetti,
Gabriel Hanssen Kiss,
Frank Lindseth

Affiliations

Andreas Hammer Haversen: Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Trøndelag, Norway
Durga Prasad Bavirisetti: ORCiD; Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Trøndelag, Norway
Gabriel Hanssen Kiss: ORCiD; Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Trøndelag, Norway
Frank Lindseth: Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Trøndelag, Norway

DOI: https://doi.org/10.1109/ACCESS.2024.3395058
Journal volume & issue: Vol. 12
pp. 62664 – 62676

Abstract

Read online

With reliable performance, and linear time complexity, Vision Transformers like the Swin Transformer are gaining popularity in the field of Medical Image Computing (MIC). Examples of effective volumetric segmentation models for brain tumours include VT-UNet, which combines conventional UNets with Swin Transformers using a unique encoder-decoder Cross-Attention (CA) paradigm. Self-Supervised Learning (SSL) has also experienced an increase in adoption in computer vision domains such as MIC, in situations where labelled training data is scarce. The Querying Transformer UNet (QT-UNet) model we introduce in this paper brings these advancements together. It is an all-Swin Transformer UNet with an encoder-decoder CA mechanism strengthened by SSL. For the purpose of evaluating the potential of QT-UNet as a generic volumetric segmentation model, it is subjected to extensive testing on several MIC datasets. Our best model achieves a Dice score of 88.61 on average and a Hausdorff Distance of 4.85mm making it competitive with State of the Art in Brain Tumour Segmentation (BraTS) 2021, using 40% fewer FLOPs than the baseline VT-UNet. We found poor results with Beyond The Cranial Vault (BTCV) and Medical Segmentation Decathlon (MSD), but validate the effectiveness of our new CA mechanism and find that the SSL pipeline is most effective when pre-trained with our CT-SSL dataset. The code be can found at https://github.com/AndreasHaaversen/QT-UNet.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords