ELCVIA Electronic Letters on Computer Vision and Image Analysis (Jun 2022)
Attention-based CNN-ConvLSTM for Handwritten Arabic Word Extraction
Abstract
Word extraction is one of the most critical steps in handwritten recognition systems. It is challenging for many reasons, such as the variability of handwritten writing styles, touching and overlapping characters, skewness problems, diacritics, ascenders, and descenders' presence. In this work, we propose a deep-learning-based approach for handwritten Arabic word extraction. We used an Attention-based CNN-ConvLSTM (Convolutional Long Short-term Memory) followed by a CTC (Connectionist Temporal Classification) function. Firstly, the text-line input image's essential features are extracted using Attention-based Convolutional Neural Networks (CNN). The extracted features and the text line's transcription are then passed to a ConvLSTM to learn a mapping between them. Finally, we used a CTC to learn the alignment between text-line images and their transcription automatically. We tested the proposed model on a complex dataset known as KFUPM Handwritten Arabic Text (KHATT \cite{khatt}). It consists of complex patterns of handwritten Arabic text-lines. The experimental results show an apparent efficiency of the used combination, where we ended up with an extraction success rate of 91.7\%.