Радіоелектронні і комп'ютерні системи (Feb 2024)

Advanced approach for Moroccan administrative documents digitization using pre-trained models CNN-based: character recognition

  • Ali Benaissa,
  • Abdelkhalak Bahri,
  • Ahmad El Allaoui,
  • My Abdelouahab Salahddine

DOI
https://doi.org/10.32620/reks.2024.1.02
Journal volume & issue
Vol. 2024, no. 1
pp. 17 – 35

Abstract

Read online

In the digital age, efficient digitization of administrative documents is a real challenge, particularly for languages with complex scripts such as those used in Moroccan documents. The subject matter of this article is the digitization of Moroccan administrative documents using pre-trained convolutional neural networks (CNNs) for advanced character recognition. This research aims to address the unique challenges of accurately digitizing various Moroccan scripts and layouts, which are crucial in the digital transformation of administrative processes. Our goal was to develop an efficient and highly accurate character recognition system specifically tailored for Moroccan administrative texts. The tasks involved comprehensive analysis and customization of pre-trained CNN models and rigorous performance testing against a diverse dataset of Moroccan administrative documents. The methodology entailed a detailed evaluation of different CNN architectures trained on a dataset representative of various types of characters used in Moroccan administrative documents. This ensured the adaptability of the models to real-world scenarios, with a focus on accuracy and efficiency in character recognition. The results were remarkable. DenseNet121 achieved a 95.78% accuracy rate on the Alphabet dataset, whereas VGG16 recorded a 99.24% accuracy on the Digits dataset. DenseNet169 demonstrated 94.00% accuracy on the Arabic dataset, 99.9% accuracy on the Tifinagh dataset, and 96.24% accuracy on the French Special Characters dataset. Furthermore, DenseNet169 attained 99.14% accuracy on the Symbols dataset. In addition, ResNet50 achieved 99.90% accuracy on the Character Type dataset, enabling accurate determination of the dataset to which a character belongs. In conclusion, this study signifies a substantial advancement in the field of Moroccan administrative document digitization. The CNN-based approach showcased in this study significantly outperforms traditional character recognition methods. These findings not only contribute to the digital processing and management of documents but also open new avenues for future research in adapting this technology to other languages and document types.

Keywords