Complexity (Jan 2021)
Benchmark Pashto Handwritten Character Dataset and Pashto Object Character Recognition (OCR) Using Deep Neural Network with Rule Activation Function
Abstract
In the area of machine learning, different techniques are used to train machines and perform different tasks like computer vision, data analysis, natural language processing, and speech recognition. Computer vision is one of the main branches where machine learning and deep learning techniques are being applied. Optical character recognition (OCR) is the ability of a machine to recognize the character of a language. Pashto is one of the most ancient and historical languages of the world, spoken in Afghanistan and Pakistan. OCR application has been developed for various cursive languages like Urdu, Chinese, and Japanese, but very little work is done for the recognition of the Pashto language. When it comes to handwritten character recognition, it becomes more difficult for OCR to recognize the characters as every handwritten character’s shape is influenced by the writer’s hand motion dynamics. The reason for the lack of research in Pashto handwritten character data as compared to other languages is because there is no benchmark dataset available for experimental purposes. This study focuses on the creation of such a dataset, and then for the evaluation purpose, a machine is trained to correctly recognize unseen Pashto handwritten characters. To achieve this objective, a dataset of 43000 images was created. Three Feed Forward Neural Network models with backpropagation algorithm using different Rectified Linear Unit (ReLU) layer configurations (Model 1 with 1-ReLU Layer, Model 2 with 2-ReLU layers, and Model 3 with 3-ReLU Layers) were trained and tested with this dataset. The simulation shows that Model 1 achieved accuracy up to 87.6% on unseen data while Model 2 achieved an accuracy of 81.60% and 3% accuracy, respectively. Similarly, loss (cross-entropy) was the lowest for Model 1 with 0.15 and 3.17 for training and testing, followed by Model 2 with 0.7 and 4.2 for training and testing, while Model 3 was the last with loss values of 6.4 and 3.69. The precision, recall, and f-measure values of Model 1 were better than those of both Model 2 and Model 3. Based on results, Model 1 (with 1 ReLU activation layer) is found to be the most efficient as compared to the other two models in terms of accuracy to recognize Pashto handwritten characters.