Key Information Extraction and Recognition from Rich Text Images

Tien Do; Thuyen Tran Doan; Khiem Le; Thua Nguyen; Duy-Dinh Le; Thanh Duc Ngo

doi:10.1142/S2196888824500131

Vietnam Journal of Computer Science (Nov 2024)

Key Information Extraction and Recognition from Rich Text Images

Tien Do,
Thuyen Tran Doan,
Khiem Le,
Thua Nguyen,
Duy-Dinh Le,
Thanh Duc Ngo

Affiliations

Tien Do: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Thuyen Tran Doan: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Khiem Le: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Thua Nguyen: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Duy-Dinh Le: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Thanh Duc Ngo: University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam

DOI: https://doi.org/10.1142/S2196888824500131
Journal volume & issue: Vol. 11, no. 04
pp. 569 – 594

Abstract

Read online

Key information extraction and recognition from rich text images are crucial for various applications. There are two main tasks involved in this process: Line Item Recognition (LIR) and Key Information Localization and Extraction (KILE). LIR aims at identifying and interpreting data line items in a document. The essential information in each line item is then classified or extracted, a task known as KILE. A widely used approach for this problem is sequence based, which relies on the generalization of a language model and requires a significant amount of training time. We present an effective and reliable solution to the problem by using RoBERTa, a transformer model trained on a large corpus, along with the LION optimizer to improve the training process. A comprehensive evaluation was conducted on two different benchmarks, emphasizing two different languages, English and Vietnamese. Experimental results on DocILE indicate that the proposed framework significantly improves the KILE task with a 7.24% increase in accuracy compared to the baseline and also enhances the correct recognition rate at the LIR stage. On MCOCR, the method achieved a Character Error Rate (CER) of 28.6%, which is competitive with the state-of-the-art on this dataset.

Published in Vietnam Journal of Computer Science

ISSN: 2196-8888 (Print); 2196-8896 (Online)
Publisher: World Scientific Publishing
Country of publisher: Singapore
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.worldscientific.com/worldscinet/vjcs

About the journal

Abstract

Keywords