Data in Brief (Dec 2023)

Generation of simulated data for Bengali text localization in natural images

  • Sourav Saha,
  • Md. Easin Arafat,
  • Md Aminul Haque Palash,
  • Dewan Md Farid,
  • M. Shamim Kaiser

Journal volume & issue
Vol. 51
p. 109568

Abstract

Read online

In the domain of vision-based applications, the importance of text cannot be underestimated due to its natural capacity to provide accurate and comprehensive information. The application of scene text editing systems enables the modification and enhancement of textual material included in natural images while maintaining the integrity of the overall visual layout. The complexity of keeping the original background context and font styles when altering, however, is an extremely difficult challenge considering the changed image must perfectly blend with the original without being altered. This article contains significant simulated data on the dynamic features of digital image editing, advertising, content development, and related fields. The system comprises key components such as 2D simulated text on the styled image (is), text image (it), masking of text (maskt), real background image (tb), real sample image (tf), text skeleton (tsk), and text styled image (tt). The source dataset contains diverse components such as background images, color variations, fonts, and text content, while the synthetic dataset consists of 49,000 randomly generated images. The dataset provides both researchers and practitioners with a rich resource for identifying and evaluating these dynamic features. The dataset is publicly accessible via the link: https://data.mendeley.com/datasets/h9kry9y46s/3

Keywords