Applied Sciences (Mar 2024)

Automatic Medical Image Segmentation with Vision Transformer

  • Jie Zhang,
  • Fan Li,
  • Xin Zhang,
  • Huaijun Wang,
  • Xinhong Hei

DOI
https://doi.org/10.3390/app14072741
Journal volume & issue
Vol. 14, no. 7
p. 2741

Abstract

Read online

Automatic image segmentation is vital for the computer-aided determination of treatment directions, particularly in terms of labelling lesions or infected areas. However, the manual labelling of disease regions is inconsistent and a time-consuming assignment. Meanwhile, radiologists’ comments are exceedingly subjective, regularly impacted by personal clinical encounters. To address these issues, we proposed a transformer learning strategy to automatically recognize infected areas in medical images. We firstly utilize a parallel partial decoder to aggregate high-level features and then generate a global feature map. Explicit edge attention and implicit reverse attention are applied to demonstrate boundaries and enhance their expression. Additionally, to alleviate the need for extensive labeled data, we propose a segmentation network combining propagation and transformer architectures that requires only a small amount of labeled data while leveraging fundamentally unlabeled images. The attention mechanisms are integrated within convolutional networks, keeping their global structures intact. Standalone transformers connected straightforwardly and receiving image patches can also achieve impressive segmentation performance. Our network enhanced the learning ability and attained a higher quality execution. We conducted a variety of ablation studies to demonstrate the adequacy of each modelling component. Experiments conducted across various medical imaging modalities illustrate that our model beats the most popular segmentation models. The comprehensive results also show that our transformer architecture surpasses established frameworks in accuracy while better preserving the natural variations in anatomy. Both quantitatively and qualitatively, our model achieves a higher overlap with ground truth segmentations and improved boundary adhesion.

Keywords