Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

Najoua Rahal; Maroua Tounsi; Amir Hussain; Adel M. Alimi

doi:10.1109/ACCESS.2021.3053618

IEEE Access (Jan 2021)

Deep Sparse Auto-Encoder Features Learning for Arabic Text Recognition

Najoua Rahal,
Maroua Tounsi,
Amir Hussain,
Adel M. Alimi

Affiliations

Najoua Rahal: ORCiD; Faculty of Sciences of Tunis, Tunis El Manar University, Tunis, Tunisia
Maroua Tounsi: REsearch Groups in Intelligent Machines (REGIM-Laboratory), National Engineering School of Sfax (ENIS), University of Sfax, Sfax, Tunisia
Amir Hussain: ORCiD; School of Computing, Edinburgh Napier University, Edinburgh, U.K.
Adel M. Alimi: ORCiD; REsearch Groups in Intelligent Machines (REGIM-Laboratory), National Engineering School of Sfax (ENIS), University of Sfax, Sfax, Tunisia

DOI: https://doi.org/10.1109/ACCESS.2021.3053618
Journal volume & issue: Vol. 9
pp. 18569 – 18584

Abstract

Read online

One of the most recent challenging issues of pattern recognition and artificial intelligence is Arabic text recognition. This research topic is still a pervasive and unaddressed research field, because of several factors. Complications arise due to the cursive nature of the Arabic writing, character similarities, unlimited vocabulary, use of multi-size and mixed-fonts, etc. To handle these challenges, an automatic Arabic text recognition requires building a robust system by computing discriminative features and applying a rigorous classifier together to achieve an improved performance. In this work, we introduce a new deep learning based system that recognizes Arabic text contained in images. We propose a novel hybrid network, combining a Bag-of-Feature (BoF) framework for feature extraction based on a deep Sparse Auto-Encoder (SAE), and Hidden Markov Models (HMMs), for sequence recognition. Our proposed system, termed BoF-deep SAE-HMM, is tested on four datasets, namely the printed Arabic line images Printed KHATT (P-KHATT), the benchmark printed word images Arabic Printed Text Image (APTI), the benchmark handwritten Arabic word images IFN/ENIT, and the benchmark handwritten digits images Modified National Institute of Standards and Technology (MNIST).

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords