Scientific Reports (Mar 2024)

Utilizing adaptive deformable convolution and position embedding for colon polyp segmentation with a visual transformer

  • Mohamed Yacin Sikkandar,
  • Sankar Ganesh Sundaram,
  • Ahmad Alassaf,
  • Ibrahim AlMohimeed,
  • Khalid Alhussaini,
  • Adham Aleid,
  • Salem Ali Alolayan,
  • P. Ramkumar,
  • Meshal Khalaf Almutairi,
  • S. Sabarunisha Begum

DOI
https://doi.org/10.1038/s41598-024-57993-0
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Polyp detection is a challenging task in the diagnosis of Colorectal Cancer (CRC), and it demands clinical expertise due to the diverse nature of polyps. The recent years have witnessed the development of automated polyp detection systems to assist the experts in early diagnosis, considerably reducing the time consumption and diagnostic errors. In automated CRC diagnosis, polyp segmentation is an important step which is carried out with deep learning segmentation models. Recently, Vision Transformers (ViT) are slowly replacing these models due to their ability to capture long range dependencies among image patches. However, the existing ViTs for polyp do not harness the inherent self-attention abilities and incorporate complex attention mechanisms. This paper presents Polyp-Vision Transformer (Polyp-ViT), a novel Transformer model based on the conventional Transformer architecture, which is enhanced with adaptive mechanisms for feature extraction and positional embedding. Polyp-ViT is tested on the Kvasir-seg and CVC-Clinic DB Datasets achieving segmentation accuracies of 0.9891 ± 0.01 and 0.9875 ± 0.71 respectively, outperforming state-of-the-art models. Polyp-ViT is a prospective tool for polyp segmentation which can be adapted to other medical image segmentation tasks as well due to its ability to generalize well.

Keywords