IEEE Access (Jan 2023)

Scene Text Segmentation via Multi-Task Cascade Transformer With Paired Data Synthesis

  • Quang-Vinh Dang,
  • Guee-Sang Lee

DOI
https://doi.org/10.1109/ACCESS.2023.3292264
Journal volume & issue
Vol. 11
pp. 67791 – 67805

Abstract

Read online

The scene text segmentation task provides a wide range of practical applications. However, the number of images in the available datasets for scene text segmentation is not large enough to effectively train deep learning-based models, leading to limited performance. To solve this problem, we employ paired data generation to secure sufficient data samples for text segmentation via Text Image-conditional GANs. Furthermore, existing models implicitly model text attributes such as size, layout, font, and structure, which hinders their performance. To remedy this, we propose a Multi-task Cascade Transformer network that explicitly learns these attributes using large volumes of generated synthetic data. The transformer-based network includes two auxiliary tasks and one main task for text segmentation. The auxiliary tasks help the network learn text regions to focus on, as well as the structure of the text through different words and fonts, to support the main task. To bridge the gap between different datasets, we train the proposed network on paired synthetic data before fine-tuning it on real data. Our experiments on publicly available scene text segmentation datasets show that our method outperforms existing methods.

Keywords