Jurnal Rekayasa Elektrika (Mar 2024)

Augmentation of Additional Arabic Dataset for Jawi Writing and Classification Using Deep Learning

  • Safrizal Razali,
  • Kahlil Muchtar,
  • Muhammad Hafiz Rinaldi,
  • Yudha Nurdin,
  • Aulia Rahman

DOI
https://doi.org/10.17529/jre.v20i1.33722
Journal volume & issue
Vol. 20, no. 1

Abstract

Read online

This research aims to create an additional dataset containing Arabic characters for writing Jawi script and to train classification models using deep learning architectures such as InceptionV3 and ResNet34. The initial stage of the study involves digital image processing to obtain the additional Arabic character dataset from several sources, including HMBD, AHAWP, and HUCD, encompassing various connected and disconnected forms of Jawi script. Image processing includes steps such as preprocessing to enhance image quality, segmentation to separate Arabic characters from the background, and augmentation to increase dataset variability. Once the dataset is formed, we train the models using appropriate training data for each InceptionV3 and ResNet34 architecture. The classification evaluation results indicate that the model with ResNet34 architecture achieved the best performance with an accuracy of 96%. This model successfully recognizes Jawi script accurately and consistently, even for classes with similar shapes. The main contribution of this research is the availability of the additional Arabic character dataset that can be utilized for Jawi script recognition and performance assessment of various deep learning models. The study also emphasizes the importance of selecting the appropriate architecture for specific character recognition tasks. The research findings affirm that the model with ResNet34 architecture has excellent capability in recognizing the additional Arabic characters for writing Jawi. The results of this research have the potential to support further developments in Jawi character recognition applications and provide valuable insights for researchers in the field of character recognition sourced from Arabic characters. Dataset augmentation results can be accessed at https://singkat.usk.ac.id/g/En0skCKGAR

Keywords