Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images

Junyoung Park; Wonjun Kang; Seonji Park; Keuntek Lee; Hyung Il Koo; Nam Ik Cho

doi:10.1109/access.2025.3572001

IEEE Access (Jan 2025)

Development of OCR Service for Page-Level Recognition for Camera-Captured Document Images

Junyoung Park,
Wonjun Kang,
Seonji Park,
Keuntek Lee,
Hyung Il Koo,
Nam Ik Cho

Affiliations

Junyoung Park: Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South Korea
Wonjun Kang: ORCiD; Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South Korea
Seonji Park: Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South Korea
Keuntek Lee: ORCiD; Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South Korea
Hyung Il Koo: ORCiD; FuriosaAI, Seoul, South Korea
Nam Ik Cho: ORCiD; Department of Electrical and Computer Engineering, INMC, Seoul National University, Seoul, South Korea

DOI: https://doi.org/10.1109/access.2025.3572001
Journal volume & issue: Vol. 13
pp. 91263 – 91275

Abstract

Read online

The emergence of Large Language Models (LLMs) has driven significant advancements in Natural Language Processing (NLP) and introduced new text-related applications, such as Visual Question Answering (VQA). As a result, there is a growing need for Optical Character Recognition (OCR) systems that can extract textual contents from document images for LLM applications. However, most existing methods have primarily focused on scene text or well-structured document images, and typically limit text detection and recognition to the word level. In this paper, we propose a novel OCR framework capable of detecting and recognizing text at both the text-line and text-block levels. Specifically, we design a new deep neural network (DNN) to replace the Connected Component (CC) extraction and state estimation processes used in conventional methods. Despite being trained solely on synthetic datasets, the proposed OCR system performs robust text detection and layout analysis. Furthermore, we propose a recognition metric to evaluate content preservation in OCR systems and introduce a new OCR benchmark consisting of camera-captured document images. Our method demonstrates superior performance on this benchmark, outperforming existing OCR APIs.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords