uTHCD: A New Benchmarking for Tamil Handwritten OCR

Noushath Shaffi; Faizal Hajamohideen

doi:10.1109/ACCESS.2021.3096823

IEEE Access (Jan 2021)

uTHCD: A New Benchmarking for Tamil Handwritten OCR

Noushath Shaffi,
Faizal Hajamohideen

Affiliations

Noushath Shaffi: ORCiD; Department of Information Technology, University of Technology and Applied Sciences, Suhar, PC, Oman
Faizal Hajamohideen: ORCiD; Department of Information Technology, University of Technology and Applied Sciences, Suhar, PC, Oman

DOI: https://doi.org/10.1109/ACCESS.2021.3096823
Journal volume & issue: Vol. 9
pp. 101469 – 101493

Abstract

Read online

The robustness of a typical Handwritten character recognition system relies on the availability of comprehensive supervised data samples. There has been considerable work reported in the literature about creating the database for several Indic scripts, but the Tamil script has only one standardized database up to date. This paper presents the work done to create an exhaustive and extensive unconstrained Tamil Handwritten Character Database (uTHCD). The samples were generated from around 850 native Tamil volunteers including school-going kids, homemakers, university students, and faculty. The database consists of about 91000 samples with nearly 600 samples in each of 156 classes. This isolated character database is made publicly available as raw images and Hierarchical Data File (HDF) compressed file. The paper also presents several possible use cases of the proposed uTHCD database using Convolutional Neural Networks (CNN) to classify handwritten Tamil characters. Several experiments demonstrate that training on the proposed database helps traditional and contemporary classifiers perform on par or better than the existing dataset when tested with unseen data. With this database, we expect to set a new benchmark in Tamil handwritten character recognition and serve as a launchpad for developing robust language technologies for the Tamil script.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords