Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Hassanin M. Al-Barhamtoshy; Kamal M. Jambi; Sherif M. Abdou; Mohsen A. Rashwan

doi:10.1109/ACCESS.2021.3066477

IEEE Access (Jan 2021)

Arabic Documents Information Retrieval for Printed, Handwritten, and Calligraphy Image

Hassanin M. Al-Barhamtoshy,
Kamal M. Jambi,
Sherif M. Abdou,
Mohsen A. Rashwan

Affiliations

Hassanin M. Al-Barhamtoshy: ORCiD; Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Kamal M. Jambi: ORCiD; Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
Sherif M. Abdou: Information Technology Department, Faculty of Artificial Intelligence, University of Cairo, Giza, Egypt
Mohsen A. Rashwan: Electronics and Communication Department, Faculty of Engineering, Cairo University, Giza, Egypt

DOI: https://doi.org/10.1109/ACCESS.2021.3066477
Journal volume & issue: Vol. 9
pp. 51242 – 51257

Abstract

Read online

This paper presents a new computational backend model that supports Arabic document information retrieval (ADIR) as a dataset and OCR services. Therefore, different services that support document analysis, retrieving, processing including dataset preparation, and recognition will be discussed. Consequently, ADIR services provide general functions of the Arabic OCR to compose many other services in the OCR domain. Furthermore, the proposed work can provide accessing different methods of document layout analysis with a platform where they can share and handle such methods (services) without any setup requirements. One of the used datasets composed from 16,800 Arabic letters written by 60 writers. Each writer wrote each letter from Alif to Ya 10 times in two forms. The forms were scanned at 300 DPI resolution and are segmented in two sets: training set with 13,440 letters for 48 images per class label, and testing set with 3,360 letters to 120 images per class label Convolutional neural network (CNN) is used and adapted for Arabic handwritten letters classification. In an experimental test, we showed that our results outperform 100% classification accuracy rate on testing images. Therefore, the ADIR services provide a “service description”, which includes an interface and a server’s URL. The interface allows communication process between clients and services. Although, in this article we evaluate IR results and compared them with respect to corrected equivalent.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords