IEEE Access (Jan 2025)

Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images

  • Junyoung Park,
  • Wonjun Kang,
  • Seonji Park,
  • Keuntek Lee,
  • Hyung Il Koo,
  • Nam Ik Cho

DOI
https://doi.org/10.1109/access.2025.3572001
Journal volume & issue
Vol. 13
pp. 91263 – 91275

Abstract

Read online

The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs.

Keywords