Segmentation-based, omnifont printed Arabic character recognition without font identification

Aziz Qaroush; Abdalkarim Awad; Mohammad Modallal; Malik Ziq

Journal of King Saud University: Computer and Information Sciences (Jun 2022)

Segmentation-based, omnifont printed Arabic character recognition without font identification

Aziz Qaroush,
Abdalkarim Awad,
Mohammad Modallal,
Malik Ziq

Affiliations

Aziz Qaroush: Corresponding author.; Department of Electrical and Computer Engineering, Birzeit University, Palestine
Abdalkarim Awad: Department of Electrical and Computer Engineering, Birzeit University, Palestine
Mohammad Modallal: Department of Electrical and Computer Engineering, Birzeit University, Palestine
Malik Ziq: Department of Electrical and Computer Engineering, Birzeit University, Palestine

Journal volume & issue: Vol. 34, no. 6
pp. 3025 – 3039

Abstract

Read online

Optical Character Recognition OCR is an essential part of many real-world applications such as digital archiving, automatic number plate recognition, handle cheques, etc. However, developing an OCR for printed Arabic text is still a challenging and open research field due to the special characteristics of Arabic cursive script. In this paper, we propose a segmentation-based, omnifont, open-vocabulary OCR for printed Arabic text. The proposed approach doesn’t require an explicit font type recognition stage. It uses an explicit, indirect character segmentation method. The presented segmentation method is baseline dependent and employs a hybrid, three-steps character segmentation algorithm to handle the problem of character overlapping. Besides, it uses a set of topological features that are designed and generalized to make the segmentation approach font independent. The segmented characters are fed as an input to a convolutional neural network for feature extraction and recognition. The APTID-MF data set has been used for testing and evaluation. The average accuracy of the proposed segmentation stage is 95%, while the average accuracy of the recognition stage is 99.97%. The whole approach achieves an average accuracy of 95% without using font-type recognition or any post-processing techniques.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-computer-and-information-sciences/

About the journal

Abstract

Keywords