Attention-Based Scene Text Detection on Dual Feature Fusion

Yuze Li; Wushour Silamu; Zhenchao Wang; Miaomiao Xu

doi:10.3390/s22239072

Sensors (Nov 2022)

Attention-Based Scene Text Detection on Dual Feature Fusion

Yuze Li,
Wushour Silamu,
Zhenchao Wang,
Miaomiao Xu

Affiliations

Yuze Li: Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Wushour Silamu: Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Zhenchao Wang: Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Miaomiao Xu: Xinjiang Multilingual Information Technology Laboratory, Xinjiang Multilingual Information Technology Research Center, College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China

DOI: https://doi.org/10.3390/s22239072
Journal volume & issue: Vol. 22, no. 23
p. 9072

Abstract

Read online

The segmentation-based scene text detection algorithm has advantages in scene text detection scenarios with arbitrary shape and extreme aspect ratio, depending on its pixel-level description and fine post-processing. However, the insufficient use of semantic and spatial information in the network limits the classification and positioning capabilities of the network. Existing scene text detection methods have the problem of losing important feature information in the process of extracting features from each network layer. To solve this problem, the Attention-based Dual Feature Fusion Model (ADFM) is proposed. The Bi-directional Feature Fusion Pyramid Module (BFM) first adds stronger semantic information to the higher-resolution feature maps through a top-down process and then reduces the aliasing effects generated by the previous process through a bottom-up process to enhance the representation of multi-scale text semantic information. Meanwhile, a position-sensitive Spatial Attention Module (SAM) is introduced in the intermediate process of two-stage feature fusion. It focuses on the one feature map with the highest resolution and strongest semantic features generated in the top-down process and weighs the spatial position weight by the relevance of text features, thus improving the sensitivity of the text detection network to text regions. The effectiveness of each module of ADFM was verified by ablation experiments and the model was compared with recent scene text detection methods on several publicly available datasets.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords