IEEE Access (Jan 2019)

Improving Urdu Recognition Using Character-Based Artistic Features of Nastalique Calligraphy

  • Qurat Ul Ain Akram,
  • Sarmad Hussain

DOI
https://doi.org/10.1109/ACCESS.2018.2887103
Journal volume & issue
Vol. 7
pp. 8495 – 8507

Abstract

Read online

The state-of-the-art Urdu recognition approaches for Nastalique use features along with the sequence of characters’ labels for classification and recognition. In Arabic-like cursive script, the characters are joined together to form a ligature. The conventional methods process the connected stroke of ligatures as a sequence of characters. However, connected stroke of a ligature image has a sequence of pairs of characters and their joiners, instead of a sequence of characters. The character has a distinctive shape that clearly distinguishes it from other characters. The joiner preserves the connecting stroke shape of a character with the next character. In this paper, an implicit Urdu character recognition technique is presented for the Nastalique writing style that is based on recognition of characters and joiners. The detailed analysis of the Nastalique calligraphy is carried out to extract the artistic features of characters and their joiners. The presented technique is tested on Dataset-1 of 1446 ligature classes covering 3309762 ligature instances and 91129 unique Urdu words. In addition, the system is also tested on 1600 text lines of UPTI dataset called Dataset-2. The character recognition accuracies are 95.58% and 98.37% on Dataset-1 and Dataset-2, respectively. The results reveal that the system outperforms the state-of-the-art hidden Markov models and deep learning-based Urdu recognition techniques.

Keywords