Franklin Open (Dec 2024)
Temporal dependency modeling for improved medical image segmentation: The R-UNet perspective
Abstract
In this study, we propose a modified version of the widely used UNet architecture, enhanced by the integration of recurrent blocks at each step of the encoder (down-sampling) and decoder (up-sampling) stages. The proposed Recurrent UNet (R-UNet) architecture aims to improve the performance of semantic segmentation tasks by allowing the model to capture temporal dependencies and long-range contextual information. The R-UNet architecture consists of two main components: a recurrent encoder and a recurrent decoder. The recurrent encoder is composed of a series of convolutional and recurrent blocks, which extract features from the input image and propagate them across time. The recurrent decoder consists of a similar series of convolutional and recurrent blocks, which use the extracted features to generate the final segmentation mask. An attention mechanism is employed to enhance feature extraction at the bottleneck of the model. The proposed R-UNet architecture is evaluated on multiple benchmark datasets, including those for liver segmentation, brain tumor detection, mitochondria segmentation, lung imaging, a proprietary lung CT COVID-19 dataset, as well as various multi-organ imaging datasets. The experimental results demonstrate that the proposed R-UNet architecture outperforms the standard UNet architecture and several other state-of-the-art semantic segmentation models in terms of accuracy score, achieving an overall accuracy of 97.2 % on the Mitochondria dataset, 97.83 % on the Liver dataset, 89.17 % on the Tumor dataset and 97.22 % Lung dataset.