IEEE Access (Jan 2024)
LMD²F-Net: Layered Multi-Scale Dual-Branch Dual-Temporal Fusion Network for Medical Image Segmentation
Abstract
Image segmentation techniques play a crucial role in medical image analysis, directly impacting disease diagnosis, treatment planning, and efficacy evaluation. Although Convolutional Neural Networks (CNNs) and transformer-based approaches have made significant progress in this area, the inherent complexity of medical images, which include features such as low contrast, fuzzy boundaries, and noise, makes automated segmentation tasks challenging. We propose a new architecture called LMD2F-Net, which combines MaxViT’s multi-axis attention and Swin Transformer’s global context modeling. This design enhances both local feature extraction and global context understanding. In the decoding stage, we incorporate a multi-scale spatio-temporal fusion module (MBFM) to optimize feature fusion and enhance the identification of key medical image features. Additionally, we introduce the Dual Layer Fusion (DLF) module, which bridges the encoder and decoder to efficiently fuse multi-level features via a cross-focusing mechanism. Experimental results on several challenging medical image segmentation datasets demonstrate that LMD2F-Net performs well on several evaluation metrics, particularly on key metrics such as the Dice similarity coefficient and Hausdorff distance. These findings confirm the potential of LMD2F-Net in improving the accuracy and robustness of medical image segmentation and provide a valuable reference for future research and clinical practice.
Keywords