A vast dataset for Kurdish handwritten digits and isolated characters recognition

Peshraw Ahmed Abdalla; Abdalbasit Mohammed Qadir; Mohammed Y. Shakor; Ari M. Saeed; Abdalla Taha Jabar; Ali Abdalla Salam; Hedi Hamid Hama Amin

Data in Brief (Apr 2023)

A vast dataset for Kurdish handwritten digits and isolated characters recognition

Peshraw Ahmed Abdalla,
Abdalbasit Mohammed Qadir,
Mohammed Y. Shakor,
Ari M. Saeed,
Abdalla Taha Jabar,
Ali Abdalla Salam,
Hedi Hamid Hama Amin

Affiliations

Peshraw Ahmed Abdalla: Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq; Corresponding author.
Abdalbasit Mohammed Qadir: Department of Computer Science, College of Science and Technology, University of Human Development, Sulaimaniyah, Iraq
Mohammed Y. Shakor: Department of English, College of Education, University of Garmian, Kalar, Iraq
Ari M. Saeed: Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq
Abdalla Taha Jabar: Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq
Ali Abdalla Salam: Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq
Hedi Hamid Hama Amin: Department of Computer Science, College of Science, University of Halabja, Halabja, Iraq

Journal volume & issue: Vol. 47
p. 109014

Abstract

Read online

This article presents two massive datasets for central Kurdish handwriting digits and isolated characters named K-ZHMARA and K-PIT. The first dataset, named K-ZHMARA dataset, contains 70,000 images of Kurdish digits, 7000 images for each digit, and a printed A4 paper with a grid of 10 × 10 is used for data collection. Apart from digits, the K-PIT dataset includes 245,000 images of all Kurdish characters, 7000 images for each character; data was collected via a printed A4 paper with a grid of 12 × 10 for this dataset. Moreover, both datasets include 315,000 images. Python programming has been used to scan each piece of paper, segment, crop, resize, binarize, and invert the images via edge detection and image processing techniques.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal

Abstract

Keywords