Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection

Amirreza Fateh; Mansoor Fateh; Vahid Abolghasemi

doi:10.1002/eng2.12832

Engineering Reports (Sep 2024)

Enhancing optical character recognition: Efficient techniques for document layout analysis and text line detection

Amirreza Fateh,
Mansoor Fateh,
Vahid Abolghasemi

Affiliations

Amirreza Fateh: School of Computer Engineering Iran University of Science and Technology (IUST) Tehran Iran
Mansoor Fateh: Faculty of Computer Engineering Shahrood University of Technology Shahrud Iran
Vahid Abolghasemi: School of Computer Science and Electronic Engineering University of Essex Colchester UK

DOI: https://doi.org/10.1002/eng2.12832
Journal volume & issue: Vol. 6, no. 9
pp. n/a – n/a

Abstract

Read online

Abstract In recent years, automatic document and text analysis has gained significant importance, driven by advancements in optical character recognition (OCR) technology and the need for efficient processing of large volumes of printed or handwritten documents. This article specifically focuses on document layout analysis (DLA) and text line detection (TLD), both of which are crucial components of OCR systems. Our objective is to develop an effective method for extracting both textual and non‐textual regions, addressing challenges unique to the Persian (and Persian‐like) language(s). In the DLA stage, we employ deep learning models and a voting system to accurately determine the regions of interest. Additionally, we introduce methods such as optimum font size concepts, angle correction, and a line curvature elimination algorithm in the TLD process to enhance OCR accuracy. Comparative evaluations against state‐of‐the‐art methods demonstrate the superiority of our approach, showcasing a 2.8% improvement in the accuracy of Tesseract‐OCR 5.1.0 (a well‐established commercial OCR system) on the official Iranian newspapers dataset. These findings underscore the importance of addressing DLA and TLD challenges to advance OCR technology for Persian language documents and provide a solid foundation for future research in this domain.

Published in Engineering Reports

ISSN: 2577-8196 (Online)
Publisher: Wiley
Country of publisher: United Kingdom
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://onlinelibrary.wiley.com/journal/25778196

About the journal

Abstract

Keywords