Automatic Medical Image Segmentation with Vision Transformer

Jie Zhang; Fan Li; Xin Zhang; Huaijun Wang; Xinhong Hei

doi:10.3390/app14072741

Applied Sciences (Mar 2024)

Automatic Medical Image Segmentation with Vision Transformer

Jie Zhang,
Fan Li,
Xin Zhang,
Huaijun Wang,
Xinhong Hei

Affiliations

Jie Zhang: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Fan Li: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Xin Zhang: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Huaijun Wang: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China
Xinhong Hei: School of Computer Science and Engineering, Xi’an University of Technology, Xi’an 710048, China

DOI: https://doi.org/10.3390/app14072741
Journal volume & issue: Vol. 14, no. 7
p. 2741

Abstract

Read online

Automatic image segmentation is vital for the computer-aided determination of treatment directions, particularly in terms of labelling lesions or infected areas. However, the manual labelling of disease regions is inconsistent and a time-consuming assignment. Meanwhile, radiologists’ comments are exceedingly subjective, regularly impacted by personal clinical encounters. To address these issues, we proposed a transformer learning strategy to automatically recognize infected areas in medical images. We firstly utilize a parallel partial decoder to aggregate high-level features and then generate a global feature map. Explicit edge attention and implicit reverse attention are applied to demonstrate boundaries and enhance their expression. Additionally, to alleviate the need for extensive labeled data, we propose a segmentation network combining propagation and transformer architectures that requires only a small amount of labeled data while leveraging fundamentally unlabeled images. The attention mechanisms are integrated within convolutional networks, keeping their global structures intact. Standalone transformers connected straightforwardly and receiving image patches can also achieve impressive segmentation performance. Our network enhanced the learning ability and attained a higher quality execution. We conducted a variety of ablation studies to demonstrate the adequacy of each modelling component. Experiments conducted across various medical imaging modalities illustrate that our model beats the most popular segmentation models. The comprehensive results also show that our transformer architecture surpasses established frameworks in accuracy while better preserving the natural variations in anatomy. Both quantitatively and qualitatively, our model achieves a higher overlap with ground truth segmentations and improved boundary adhesion.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords