IEEE Access (Jan 2024)

Okkhor-Diffusion: Class Guided Generation of Bangla Isolated Handwritten Characters Using Denoising Diffusion Probabilistic Model (DDPM)

  • Md. Mubtasim Fuad,
  • A. Faiyaz,
  • Noor Mairukh Khan Arnob,
  • M. F. Mridha,
  • Aloke Kumar Saha,
  • Zeyar Aung

DOI
https://doi.org/10.1109/ACCESS.2024.3370674
Journal volume & issue
Vol. 12
pp. 37521 – 37539

Abstract

Read online

Bangla has a unique script with a complex set of characters, making it a fascinating subject of study for linguists and cultural enthusiasts. Unique in some of its similar characters which are only distinguishable by subtle differences in their shapes and diacritics, there has been a notable increase in research on Bangla character recognition and classification using machine learning-based approaches. However, Handwritten Bangla Character Recognition (HBCR) training requires an adequate amount of data from a diversely distributed dataset. Making diverse datasets for HBCR training is a challenging and tedious task to carry out. Yet, there is limited research on the automatic generation of handwritten Bangla characters. Motivated by this open area of research, this paper proposes a novel approach ‘Okkhor-Diffusion’ for class-guided generation of Bangla isolated handwritten characters using a novel Denoising Diffusion Probabilistic Model (DDPM). No prior research has used DDPM for this purpose, making the proposed approach novel. The DDPM is a generative model that uses a diffusion process to transform noise-corrupted data into diverse samples; despite being trained on a small training set. In our experiments, StyleGAN2-ADA had notably inferior performance compared to Okkhor-Diffusion in generating realistic isolated handwritten Bangla characters. Experimental results on the BanglaLekha-Isolated dataset demonstrate that the proposed Okkhor-Diffusion model generates realistic isolated handwritten Bangla characters, with a mean Multi-Scale Structural Similarity Index Measure (MS-SSIM) score of 0.178 compared to 0.177 for the real samples. The Fréchet Inception Distance (FID) score for the synthetic handwritten Bangla characters is 5.426. Finally, the newly proposed Bangla Character Aware Fréchet Inception Distance (BCAFID) score of the proposed Okkhor-Diffusion model is 10.388. The code for the proposed Okkhor-Diffusion framework is available at https://github.com/MubtasimFuad10/Okkhor-Diffusion.

Keywords