Pre-training transformer with dual-branch context content module for table detection in document images

Yongzhi Li; Pengle Zhang; Meng Sun; Jin Huang; Ruhan He

Virtual Reality & Intelligent Hardware (Oct 2024)

Pre-training transformer with dual-branch context content module for table detection in document images

Yongzhi Li,
Pengle Zhang,
Meng Sun,
Jin Huang,
Ruhan He

Affiliations

Yongzhi Li: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430064, China
Pengle Zhang: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430064, China
Meng Sun: School of Computer Science, South-Central Minzu University, Wuhan 430064, China
Jin Huang: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430064, China; Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion, Wuhan Textile University, Wuhan 430064, China; Corresponding author.
Ruhan He: School of Computer Science and Artificial Intelligence, Wuhan Textile University, Wuhan 430064, China; Hubei Provincial Engineering Research Center for Intelligent Textile and Fashion, Wuhan Textile University, Wuhan 430064, China

Journal volume & issue: Vol. 6, no. 5
pp. 408 – 420

Abstract

Read online

Background: Document images such as statistical reports and scientific journals are widely used in information technology. Accurate detection of table areas in document images is an essential prerequisite for tasks such as information extraction. However, because of the diversity in the shapes and sizes of tables, existing table detection methods adapted from general object detection algorithms, have not yet achieved satisfactory results. Incorrect detection results might lead to the loss of critical information. Methods: Therefore, we propose a novel end-to-end trainable deep network combined with a self-supervised pretraining transformer for feature extraction to minimize incorrect detections. To better deal with table areas of different shapes and sizes, we added a dual-branch context content attention module (DCCAM) to high-dimensional features to extract context content information, thereby enhancing the network's ability to learn shape features. For feature fusion at different scales, we replaced the original 3×3 convolution with a multilayer residual module, which contains enhanced gradient flow information to improve the feature representation and extraction capability. Results: We evaluated our method on public document datasets and compared it with previous methods, which achieved state-of-the-art results in terms of evaluation metrics such as recall and F1-score. https://github.com/YongZ-Lee/TD-DCCAM

Published in Virtual Reality & Intelligent Hardware

ISSN: 2096-5796 (Print); 2666-1209 (Online)
Publisher: KeAi Communications Co., Ltd.
Country of publisher: China
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.keaipublishing.com/en/journals/virtual-reality-and-intelligent-hardware/

About the journal

Abstract

Keywords