IEEE Access (Jan 2023)

A Two-Stage Method for Polyp Detection in Colonoscopy Images Based on Saliency Object Extraction and Transformers

  • Alan Carlos de Moura Lima,
  • Lisle Faray de Paiva,
  • Geraldo Braz,
  • Joao Dallyson S. de Almeida,
  • Aristofanes Correa Silva,
  • Miguel Tavares Coimbra,
  • Anselmo Cardoso de Paiva

DOI
https://doi.org/10.1109/ACCESS.2023.3297097
Journal volume & issue
Vol. 11
pp. 76108 – 76119

Abstract

Read online

The gastrointestinal tract is responsible for the entire digestive process. Several diseases, including colorectal cancer, can affect this pathway. Among the deadliest cancers, colorectal cancer is the second most common. It arises from benign tumors in the colon, rectum, and anus. These benign tumors, known as colorectal polyps, can be diagnosed and removed during colonoscopy. Early detection is essential to reduce the risk of cancer. However, approximately 28% of polyps are lost during this examination, mainly because of limitations in diagnostic techniques and image analysis methods. In recent years, computer-aided detection techniques for these lesions have been developed to improve detection quality during periodic examinations. We proposed an automatic method for polyp detection using colonoscopy images. This study presents a two-stage polyp detection method for colonoscopy images using transformers. In the first stage, a saliency map extraction model is supported by the extracted depth maps to identify possible polyp areas. The second stage of the method consists of detecting polyps in the extracted images resulting from the first stage, combined with the green and blue channels. Several experiments were performed using four public colonoscopy datasets. The best results obtained for the polyp detection task were satisfactory, reaching 91% Average Precision in the CVC-ClinicDB dataset, 92% Average Precision in the Kvasir-SEG dataset, and 84% Average Precision in the CVC-ColonDB dataset. This study demonstrates that polyp detection in colonoscopy images can be efficiently performed using a combination of depth maps, salient object-extracted maps, and transformers.

Keywords