Data in Brief (Dec 2024)

Pixel-wise annotation for clear and contaminated regions segmentation in wireless capsule endoscopy images: A multicentre database

  • Vahid Sadeghi,
  • Yasaman Sanahmadi,
  • Maryam Behdad,
  • Alireza Vard,
  • Mohsen Sharifi,
  • Ahmad Raeisi,
  • Mehdi Nikkhah,
  • Alireza Mehridehnavi

Journal volume & issue
Vol. 57
p. 110927

Abstract

Read online

Wireless capsule endoscopy (WCE) is capable of non-invasively visualizing the small intestine, the most complicated segment of the gastrointestinal tract, to detect different types of abnormalities. However, its main drawback is reviewing the vast number of captured images (more than 50,000 frames). The recorded images are only sometimes clear, and different contaminating agents, such as turbid materials and air bubbles, degrade the visualization quality of the WCE images. This condition could cause serious problems such as reducing mucosal view visualization, prolonging recorded video reviewing time, and increasing the risks of missing pathology. On the other hand, accurately quantifying the amount of turbid fluids and bubbles can indicate potential motility malfunction. To assist in developing computer vision-based techniques, we have constructed the first multicentre publicly available clear and contaminated annotated dataset by precisely segmenting 17,593 capsule endoscopy images from three different databases.In contrast to the existing datasets, our dataset has been annotated at the pixel level, discriminating the clear and contaminated regions and subsequently differentiating bubbles and turbid fluids from normal tissue. To create the dataset, we first selected all of the images (2906 frames) in the reduced mucosal view class covering different levels of contamination and randomly selected 12,237 images from the normal class of the copyright-free CC BY 4.0 licensed small bowel capsule endoscopy (SBCE) images from the Kvasir capsule endoscopy database. To mitigate the possible available bias in the mentioned dataset and to increase the sample size, the number of 2077 and 373 images have been stochastically chosen from the SEE-AI project and CECleanliness datasets respectively for the subsequent annotation. Randomly selected images have been annotated with the aid of ImageJ and ITK-SNAP software under the supervision of an expert SBCE reader with extensive experience in gastroenterology and endoscopy. For each image, two binary and tri-colour ground truth (GT) masks have been created in which each pixel has been indexed into two classes (clear and contaminated) and three classes (bubble, turbid fluids, and normal), respectively.To the best of the author's knowledge, there is no implemented clear and contaminated region segmentation on the capsule endoscopy reading software. Curated multicentre dataset can be utilized to implement applicable segmentation algorithms for identification of clear and contaminated regions and discrimination bubbles, as well as turbid fluids from normal tissue in the small intestine.Since the annotated images belong to three different sources, they provide a diverse representation of the clear and contaminated patterns in the WCE images. This diversity is valuable for training the models that are more robust to variations in data characteristics and can generalize well across different subjects and settings. The inclusion of images from three different centres allows for robust cross-validation opportunities, where computer vision-based models can be trained on one centre's annotated images and evaluated on others.

Keywords