Toward Semi-Supervised Graphical Object Detection in Document Images

Goutham Kallempudi; Khurram Azeem Hashmi; Alain Pagani; Marcus Liwicki; Didier Stricker; Muhammad Zeshan Afzal

doi:10.3390/fi14060176

Future Internet (Jun 2022)

Toward Semi-Supervised Graphical Object Detection in Document Images

Goutham Kallempudi,
Khurram Azeem Hashmi,
Alain Pagani,
Marcus Liwicki,
Didier Stricker,
Muhammad Zeshan Afzal

Affiliations

Goutham Kallempudi: Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
Khurram Azeem Hashmi: Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
Alain Pagani: German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany
Marcus Liwicki: Department of Computer Science, Luleå University of Technology, 97187 Lulea, Sweden
Didier Stricker: Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany
Muhammad Zeshan Afzal: Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany

DOI: https://doi.org/10.3390/fi14060176
Journal volume & issue: Vol. 14, no. 6
p. 176

Abstract

Read online

The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords