Texts as Lines: Text Detection with Weak Supervision

Mathematical Problems in Engineering. 2020;2020 DOI 10.1155/2020/3871897

 

Journal Homepage

Journal Title: Mathematical Problems in Engineering

ISSN: 1024-123X (Print); 1563-5147 (Online)

Publisher: Hindawi Limited

LCC Subject Category: Technology: Engineering (General). Civil engineering (General) | Science: Mathematics

Country of publisher: United Kingdom

Language of fulltext: English

Full-text formats available: PDF, HTML, ePUB, XML

 

AUTHORS


Weijia Wu (Zhejiang University, Key Laboratory for Biomedical Engineering of Ministry, Hangzhou, China)

Jici Xing (Zhengzhou University, School of Information Engineering Institute, Zhengzhou, China)

Cheng Yang (Zhejiang University, Key Laboratory for Biomedical Engineering of Ministry, Hangzhou, China)

Yuxing Wang (Zhejiang University, Key Laboratory for Biomedical Engineering of Ministry, Hangzhou, China)

Hong Zhou (Zhejiang University, Key Laboratory for Biomedical Engineering of Ministry, Hangzhou, China)

EDITORIAL INFORMATION

Blind peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 26 weeks

 

Abstract | Full Text

Scene text detection methods based on deep learning have recently shown remarkable improvement. Most text detection methods train deep convolutional neural networks with full masks requiring pixel accuracy for good quality training. Normally, a skilled engineer needs to drag tens of points to create a full mask for the curved text. Therefore, data labelling based on full masks is time consuming and laborious, particularly for curved texts. To reduce the labelling cost, a weakly supervised method is first proposed in this paper. Unlike the other detectors (e.g., PSENet or TextSnake) that use full masks, our method only needs coarse masks for training. More specifically, the coarse mask for one text instance is a line across the text region in our method. Compared with full mask labelling, data labelling using the proposed method could save labelling time while losing much annotation information. In this context, a network pretrained on synthetic data with full masks is used to enhance the coarse masks in a real image. Finally, the enhanced masks are fed back to train our network. Analysis of experiments performed using the model shows that the performance of our method is close to that of the fully supervised methods on ICDAR2015, CTW1500, Total-Text, and MSRA-TD5000.