IET Computer Vision (Mar 2024)
IDBNet: Improved differentiable binarisation network for natural scene text detection
Abstract
Abstract The text in the natural scene can express rich semantic information, which helps people understand and analyse daily things. This paper focuses on the problems of discrete text spatial distribution and variable text geometric size in natural scenes with complex backgrounds and proposes an end‐to‐end natural scene text detection method based on DBNet. The authors first use IResNet as the backbone network, which does not increase network parameters while retaining more text features. Furthermore, a module with Transformer is introduced in the feature extraction stage to strengthen the correlation between high‐level feature pixels. Then, the authors add a spatial pyramid pooling structure in the end of feature extraction, which realises the combination of local and global features, enriches the expressive ability of feature maps, and alleviates the detection limitations caused by the geometric size of features. Finally, to better integrate the features of each level, a dual attention module is embedded after multi‐scale feature fusion. Extensive experiments on the MSRA‐TD500, CTW1500, ICDAR2015, and MLT2017 data set are conducted. The results showed that IDBNet can improve the average precision, recall, and F‐measure of a text compared with the state of art text detection methods and has higher predictive ability and practicability.
Keywords