JISR on Computing (Jul 2015)

Urdu Optical Character Recognition Technique for Jameel Noori Nastaleeq Script

  • Engr. Reema Qaiser Khan,
  • Engr. Wafa Qaiser Khan

DOI
https://doi.org/10.31645/jisrc/(2015).13.1.0011
Journal volume & issue
Vol. 13, no. 1

Abstract

Read online

Urdu OCR’s have been an object of interest for many developers in the recent years. Active research is being done pertaining to Urdu OCR’s, but because of the complexity associated with Urdu fonts; it still lacks perfection halting it from coming up to the surface. The main objective was to create a technique that could be applied to any of the existing Urdu fonts/scripts. In this paper, the authors have developed a technique which is capable of extracting the Urdu font “Jameel Noori Nastaleeq” from images and converts it into editable textual Unicodes. The approach comprises of pre-processing techniques, label connected components, feature extraction, and image comparison. The identified objects are saved as templates which are then compared to the white pixel position length database created by the authors in order to identify the templates which are then converted into Unicode.

Keywords