IEEE Access (Jan 2024)
Natural Scene Text Detection With Multiscale Feature Augmentation and Attention Mechanisms
Abstract
Recently, the DB algorithm has drawn considerable attention in scene text detection due to its differentiable binarization module, which is proposed to simplify the complex post-processing of the existing segmentation-based scene text detection approaches. However, DB is limited to its layer-wise multiscale feature representation, semantic information loss for the feature map of the highest level, insufficient localization signals at the higher layers, and less scale robustness of the segmentation model. In this paper, we propose a novel scene text detector with multiscale feature augmentation and attention mechanisms (MFAAM). Specifically, Res2Net serves as the backbone network for extracting fine-grained features with multiple scales. In feature fusion, we construct the Deep Feature Enhancement (DFE) module which can extract ratio-invariant spatial context information to reduce the semantic information loss in the top-level feature map. The Feature Pyramid Augmentation (FPA) module is employed to fuse the lower-level positioning information along the bottom-up path by which the higher-level features are strengthened. Furthermore, the improvement for scale robustness of the segmentation model is implemented by Attentional Feature Fusion (AFF), which can adaptively integrate multiscale features through the attention mechanisms with the channel and spatial context. Experiments on the ICDAR2015 dataset validate the superiority of the proposed method. It is worth noting that the presented detector outperforms DB by 4.38 percent on precision, 2.88 percent on recall, and 3.55 percent on F-measure. Compared with the state-of-the-art algorithms, the presented algorithm exhibits a higher F-measure.
Keywords