Applied Sciences (Jul 2025)
Ulcerative Severity Estimation Based on Advanced CNN–Transformer Hybrid Models
Abstract
The neural network-based classification of endoscopy images plays a key role in diagnosing gastrointestinal diseases. However, current models for estimating ulcerative colitis (UC) severity still lack high performance, highlighting the need for more advanced and accurate solutions. This study aims to apply a state-of-the-art hybrid neural network architecture—combining convolutional neural networks (CNNs) and transformer models—to classify intestinal endoscopy images, utilizing the largest publicly available annotated UC dataset. A 10-fold cross-validation is performed on the LIMUC dataset using CoAtNet models, combined with the Class Distance Weighted Cross-Entropy (CDW-CE) loss function. The best model is compared against pure CNN and transformer baselines by evaluating performance metrics, including quadratically weighted kappa (QWK) and macro F1, for full Mayo score classification, and kappa and F1 scores for remission classification. The CoAtNet models outperformed both pure CNN and transformer models. The most effective model, CoAtNet_2, improved classification accuracy by 1.76% and QWK by 1.46% over the previous state-of-the-art models on the LIMUC dataset. Other metrics, including F1 score, also showed clear improvements. Experiments show that the CoAtNet model, which integrates convolutional and transformer components, improves UC assessment from endoscopic images, enhancing AI’s role in computer-aided diagnosis.
Keywords