IEEE Access (Jan 2024)
Advancing Optical Character Recognition for Low-Resource Scripts: A Siamese Meta-Learning Approach With PSN Framework
Abstract
With the increasing demand for digitization, Optical Character Recognition (OCR) systems play a vital role in digitizing physical manuscripts. Several methods have been successfully deployed in the OCR domain. However, they often face challenges when dealing with low-resource regional scripts because of the limited training data and complex structure of characters. In such a scenario, Siamese Network (SN) meta-learning offers a promising solution for this problem by enabling quick adaptation to new tasks with minimal training data. Despite the success of SNs in various classification tasks, the traditional SN architecture seeks a compelling upgrade to improve its ability to distinguish between similar-looking characters of regional scripts. In this research paper, we propose a novel Priority-Smart Network (PSN) framework for traditional SN architectures, which can easily be incorporated into existing CNN backbone and improve their ability to identify characters in low-resource regional scripts. Furthermore, we propose the Enhanced Differential Edge Detection (EDED) preprocessing strategy explicitly designed for OCR tasks. We rigorously investigate and evaluate three benchmark low-resource script datasets to establish the effectiveness of our proposed techniques. Our experimentation results showcase significant advancements in character recognition accuracy and robustness, emphasizing the potential of SN combined with the PSN framework and EDED strategy for improving OCR systems in low-resource script.
Keywords