IEEE Access (Jan 2025)
A Novel Framework for Saraiki Script Recognition Using Advanced Machine Learning Models (YOLOv8 and CNN)
Abstract
In recent years, a lot of local languages require careful consideration in terms of knowledge exchange. Each language creates a sophisticated understanding system that allows users to interact, communicate, express ideas, and define demands. With an emphasis on language’s structure, usage, and social dimensions, linguistics examines these systems holistically. Linguistics’ fundamental goal is to comprehend how language functions in composition and context. On the other hand, machine learning is the study of how to teach machines to learn and predict using data instead of explicit programming. By combining these two domains, machine learning has emerged as a potent instrument in linguistics, improving our capacity to comprehend semantics, analyze verbal patterns, and even simulate human-like replies. This work represents a major breakthrough in the use of machine learning models for the detection and replication of Saraiki handwritten text as Saraiki is one of the important language. Accurate and reliable OCR systems tailored to the Saraiki script were the aim. The study used Convolutional Neural Networks (CNNs) in conjunction with YOLOv8 models to address the problems of recognizing Saraiki alphabets’ primary and secondary components. Several data augmentation approaches were used to improve the performance of these models, which had been painstakingly trained. After four distinct models A, B, C, and D were trained, Model D continuously performed better than the others. Additionally, the study introduced the “Saraiki Handwritten Characters” Dataset (SHC), which features well-labeled segmented characters evaluated by annotators. This research provides insight into Saraiki script machine learning models and opens the door to Saraiki language text processing, transcription, and document digitization, contributing to OCR and showing how machine learning can preserve linguistic diversity.
Keywords