IEEE Access (Jan 2021)

Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

  • Hassanin M. Al-Barhamtoshy,
  • Kamal M. Jambi,
  • Sherif M. Abdou,
  • Mohsen A. Rashwan

DOI
https://doi.org/10.1109/ACCESS.2021.3066477
Journal volume & issue
Vol. 9
pp. 51242 – 51257

Abstract

Read online

This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a “service description”, which includes an interface and a server’s URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.

Keywords