Rwin-FPN++: Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting

Chengbin Zeng; Yi Liu; Chunli Song

doi:10.3390/app12178488

Applied Sciences (Aug 2022)

Rwin-FPN++: Rwin Transformer with Feature Pyramid Network for Dense Scene Text Spotting

Chengbin Zeng,
Yi Liu,
Chunli Song

Affiliations

Chengbin Zeng: School of Big Data, Guizhou Institute of Technology, No. 1, Caiguan Road, Yunyan District, Guiyang 550003, China
Yi Liu: School of Big Data, Guizhou Institute of Technology, No. 1, Caiguan Road, Yunyan District, Guiyang 550003, China
Chunli Song: School of Big Data, Guizhou Institute of Technology, No. 1, Caiguan Road, Yunyan District, Guiyang 550003, China

DOI: https://doi.org/10.3390/app12178488
Journal volume & issue: Vol. 12, no. 17
p. 8488

Abstract

Read online

Scene text spotting has made tremendous progress with the in-depth research on deep convolutional neural networks (DCNN). Previous approaches mainly focus on the spotting of arbitrary-shaped scene text, on which it is difficult to achieve satisfactory results on dense scene text containing various instances of bending, occlusion, and lighting. To address this problem, we propose an approach called Rwin-FPN++, which incorporates the long-range dependency merit of the Rwin Transformer into the feature pyramid network (FPN) to effectively enhance the functionality and generalization of FPN. Specifically, we first propose the rotated windows-based Transformer (Rwin) to enhance the rotation-invariant performance of self-attention. Then, we attach the Rwin Transformer to each level on our feature pyramids to extract global self-attention contexts for each feature map produced by the FPN. Thirdly, we fuse these feature pyramids by upsampling to predict the score matrix and keypoints matrix of the text regions. Fourthly, a simple post-processing process is adopted to precisely merge the pixels in the score matrix and keypoints matrix and obtain the final segmentation results. Finally, we use the recurrent neural network to recognize each segmentation region and thus achieve the final spotting results. To evaluate the performance of our Rwin-FPN++ network, we construct a dense scene text dataset with various shapes and occlusion from the wiring of the terminal block of the substation panel cabinet. We train our Rwin-FPN++ network on public datasets and then evaluate the performance on our dense scene text dataset. Experiments demonstrate that our Rwin-FPN++ network can achieve an F-measure of 79% and outperform all other methods in F-measure by at least 2.8%. This is because our proposed method has better rotation invariance and long-range dependency merit.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords