Data in Brief (Oct 2024)

GHCR—A dataset for Grantha handwritten character recognition

  • Basaraboyina Yohoshiva,
  • Nagendra Panini Challa

Journal volume & issue
Vol. 56
p. 110783

Abstract

Read online

This dataset presents a comprehensive collection of handwritten Grantha characters, comprising numbers and vowels, gathered from participants spanning diverse age groups. Utilizing standard A4 sheets, participants were instructed to handwrite Grantha characters. The Grantha script encompasses 10 numbers and 34 vowels. The Grantha Character dataset comprises 44 distinct characters of numbers and vowels.A dataset comprising 133 handwritten samples for each number and 133 for each vowel was collected. These samples underwent digitization and preprocessing steps, including segmentation, resizing, and grayscale conversion. The final dataset consists of 5852 images, comprising 1330 samples for numbers and 4522 samples for vowels. The data is provided in both image and CSV formats, accompanied by corresponding labels.facilitating its utilization in machine learning model development. With limited datasets available for the Grantha script, this contribution addresses a significant gap by providing a benchmark dataset for Grantha numeral and vowel recognition.Moreover, this novel dataset serves as a fundamental resource for commencing machine learning research in Indian languages that have historical connections to the Grantha script.

Keywords