IEEE Access (Jan 2020)
Arbitrary Shape Natural Scene Text Detection Method Based on Soft Attention Mechanism and Dilated Convolution
Abstract
Natural scene text detection has attracted much attention in the research field of computer vision, and it has been widely used in many applications, i.e., unmanned driving, robot sensing. Some methods have been proposed for horizontal and oriented text detection, but detecting irregular shapes and highly varying orientated text is still a challenging problem. To tackle this problem, we propose a robust arbitrary shape text detection method called Soft Dilated network (SDnet). The proposed method has two essential steps: (1) feature extraction by backbone; (2) post-processing approach to generate elaborated polygons or boundaries. In particular, the backbone is based on soft attention mechanism and dilated convolution. The soft attention mechanism learns and obtains importance feature from each feature channel, and dilated convolution can effectively aggregate multi-scale contextual information without losing the resolution, and enhance the robust of the network model. The proposed method can accurately detect curve text and discriminate text and non-text areas in an efficient fashion. In addition, Jaccard coefficient is used as loss function to promote the post-processing capability of detecting sparse-arranged and arbitrary shape text. Based on the aforementioned technique, the proposed method an effectively handle the problem of sparse arranged arbitrary natural scene text detection. Experiments were conducted on three benchmark datasets: curved text dataset CTW1500, Total-Text and oriented dataset ICDAR2015, and the results show that when compared with the state-of-the-art text detection methods, the proposed method is more robust and it can find smaller text blocks in the image due to the Loss Function calculation with Jaccard coefficient. Furthermore, we performed multiple sets of ablation experiments, verify the effectiveness of the propose method.
Keywords