Human Body Segmentation in Wide-Angle Images Based on Fast Vision Transformers

Xiao Yu; Yunfeng Hua; Siyun Zhang; Zhaocheng Xu

doi:10.1109/ACCESS.2024.3507272

IEEE Access (Jan 2024)

Human Body Segmentation in Wide-Angle Images Based on Fast Vision Transformers

Xiao Yu,
Yunfeng Hua,
Siyun Zhang,
Zhaocheng Xu

Affiliations

Xiao Yu: ORCiD; School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
Yunfeng Hua: Shining3D Tech Company Ltd., Hangzhou, China
Siyun Zhang: ORCiD; School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou, China
Zhaocheng Xu: School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand

DOI: https://doi.org/10.1109/ACCESS.2024.3507272
Journal volume & issue: Vol. 12
pp. 178971 – 178981

Abstract

Read online

Achieving effective and efficient segmentation of human body regions in distorted images is of practical significance. Current methods rely on transformers to extract discriminative features. However, due to the unique global attention mechanism, existing transformers lack detailed image features and incur high computational costs, resulting in subpar segmentation accuracy and slow inference speed. In this paper, we introduce the Human Spatial Prior Module (HSPM) and Dynamic Token Pruning Module (DTPM). The HSPM is specifically designed to capture human features in distorted images, using dynamic methods to extract highly variable details. The DTPM accelerates inference by pruning unimportant tokens from each layer of the Vision Transformer (ViT). Unlike traditional cropping approaches, the cropped tokens are preserved using feature maps and selectively reactivated in subsequent network layers to improve model performance. To validate the effectiveness of Vision Transformer in Distorted Image (ViT-DI), we extend the ADE20K dataset and conduct experiments on the constructed dataset and the Cityscapes dataset. Our method achieves an mIoU increase of 1.6 and an FPS increase of 4.4 on the ADE20K dataset, and an mIoU increase of 0.77 and an FPS increase of 2.9 on the Cityscapes dataset, with a reduction in model size of approximately 130 GFLOPs. The URL to our dataset is: https://github.com/GitHubYuxiao/ViT-DI.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords