IEEE Access (Jan 2024)
Improving Endoscopic Image Analysis: Attention Mechanism Integration in Grid Search Fine-Tuned Transfer Learning Model for Multi-Class Gastrointestinal Disease Classification
Abstract
Due to a continuous change in people’s lifestyle and dietary habits, gastrointestinal diseases are on the increase, with dietary changes being a major contributor to a variety of bowel problems. Around two million people around the world die due to gastrointestinal (GI) diseases. Endoscopy is a medical imaging technology helpful in diagnosing gastrointestinal diseases like polyps and esophagitis. Its manual diagnosis is time-consuming; hence, computer-aided techniques are now widely used for accurate and fast GI disease diagnosis. In this paper, the Kvasir dataset of 4000 endoscopic images, comprising 500 images of each of the eight gastrointestinal tract disease classes have been classified using seven grid search fine-tuned transfer learning models. The fine-tuned transfer learning models employed in this paper are ResNet101, InceptionV3, InceptionResNetV2, Xception, DenseNet121, MobileNetV2, and ResNet50. The grid search algorithm has been used to determine the architectural and fine-tuning hyperparameters. The fine-tuned ResNet101 model performed the best, with a learning rate 0.001 and a batch size of 32 for the SGD optimizer at 40 epochs. These hyperparameters were optimized through grid search along with new set of layers added to the model. The newly added layers include one flatten layer, two dropout layers and five dense layers optimized using grid search. The grid search fine-tuned ResNet101 model obtained an accuracy of 0.90, a precision of 0.92, a recall of 0.92, and an f1-score of 0.91. Further, the grid search fine-tuned ResNet101 model was integrated with an attention mechanism to enhance performance by focusing on essential image features, notably in medical imaging where some regions may contain vital diagnostic information. The proposed grid search fine-tuned and attention mechanism integrated ResNet101 model achieved an accuracy of 0.935, precision of 0.93, recall of 0.94 and an f1-score of 0.93.
Keywords