JGH Open (Sep 2024)

Exploring vision transformers for classifying early Barrett's dysplasia in endoscopic images: A pilot study on white‐light and narrow‐band imaging

  • Jin L Tan,
  • Dileepa Pitawela,
  • Mohamed A Chinnaratha,
  • Andrawus Beany,
  • Enrik J Aguila,
  • Hsiang‐Ting Chen,
  • Gustavo Carneiro,
  • Rajvinder Singh

DOI
https://doi.org/10.1002/jgh3.70030
Journal volume & issue
Vol. 8, no. 9
pp. n/a – n/a

Abstract

Read online

Abstract Background and Aim Various deep learning models, based on convolutional neural network (CNN), have been shown to improve the detection of early esophageal neoplasia in Barrett's esophagus. Vision transformer (ViT), derived from natural language processing, has emerged as the new state‐of‐the‐art for image recognition, outperforming predecessors such as CNN. This pilot study explores the use of ViT to classify the presence or absence of early esophageal neoplasia in endoscopic images of Barrett's esophagus. Methods A BO dataset of 1918 images of Barrett's esophagus from 267 unique patients was used. The images were classified as dysplastic (D‐BO) or non‐dysplastic (ND‐BO). A pretrained vision transformer model, ViTBase16, was used to develop our classifier models. Three ViT models were developed for comparison based on imaging modality: white‐light imaging (WLI), narrow‐band imaging (NBI), and combined modalities. Performance of each model was evaluated based on accuracy, sensitivity, specificity, confusion matrices, and receiver operating characteristic curves. Results The ViT models demonstrated the following performance: WLI‐ViT (Accuracy: 92%, Sensitivity: 82%, Specificity: 95%), NBI‐ViT (Accuracy: 99%, Sensitivity: 97%, Specificity: 99%), and combined modalities‐ViT (Accuracy: 93%, Sensitivity: 87%, Specificity: 95%). Combined modalities‐ViT showed greater accuracy (94% vs 90%) and sensitivity (80% vs 70%) compared with WLI‐ViT when classifying WLI images on a subgroup testing set. Conclusion ViT exhibited high accuracy in classifying the presence or absence of EON in endoscopic images of Barrett's esophagus. ViT has the potential to be widely applicable to other endoscopic diagnoses of gastrointestinal diseases.

Keywords