Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong; Wenhai Wang; Deng-Ping Fan; Jinpeng Li; Huazhu Fu; Ling Shao

doi:10.26599/AIR.2023.9150015

CAAI Artificial Intelligence Research (Dec 2023)

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers

Bo Dong,
Wenhai Wang,
Deng-Ping Fan,
Jinpeng Li,
Huazhu Fu,
Ling Shao

Affiliations

Bo Dong: College of Computer Science, Nankai University, Tianjin 300350, China
Wenhai Wang: Shanghai Artificial Intelligence Laboratory, Shanghai 200232, China
Deng-Ping Fan: College of Computer Science, Nankai University, Tianjin 300350, China
Jinpeng Li: Computer Vision Lab, Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates
Huazhu Fu: Institute of High Performance Computing, Agency for Science, Technology and Research, Singapore 138632, Singapore
Ling Shao: UCAS-Terminus AI Lab, Terminus Group, Chongqing 400042, China

DOI: https://doi.org/10.26599/AIR.2023.9150015
Journal volume & issue: Vol. 2
p. 9150015

Abstract

Read online

Most polyp segmentation methods use convolutional neural networks (CNNs) as their backbone, leading to two key issues when exchanging information between the encoder and decoder: (1) taking into account the differences in contribution between different-level features, and (2) designing an effective mechanism for fusing these features. Unlike existing CNN-based methods, we adopt a transformer encoder, which learns more powerful and robust representations. In addition, considering the image acquisition influence and elusive properties of polyps, we introduce three standard modules, including a cascaded fusion module (CFM), a camouflage identification module (CIM), and a similarity aggregation module (SAM). Among these, the CFM is used to collect the semantic and location information of polyps from high-level features; the CIM is applied to capture polyp information disguised in low-level features, and the SAM extends the pixel features of the polyp area with high-level semantic position information to the entire polyp area, thereby effectively fusing cross-level features. The proposed model, named Polyp-PVT, effectively suppresses noises in the features and significantly improves their expressive capabilities. Extensive experiments on five widely adopted datasets show that the proposed model is more robust to various challenging situations (e.g., appearance changes, small objects, and rotation) than existing representative methods. The proposed model is available at https://github.com/DengPingFan/Polyp-PVT.

Published in CAAI Artificial Intelligence Research

ISSN: 2097-194X (Print)
Publisher: Tsinghua University Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.sciopen.com/journal/2097-194X

About the journal

Abstract

Keywords