IEEE Access (Jan 2022)

Vietnamese Document Analysis: Dataset, Method and Benchmark Suite

  • Khang Nguyen,
  • An Nguyen,
  • Nguyen D. Vo,
  • Tam V. Nguyen

DOI
https://doi.org/10.1109/ACCESS.2022.3211069
Journal volume & issue
Vol. 10
pp. 108046 – 108066

Abstract

Read online

Document image understanding is increasingly useful since the number of digital documents is increasing day-by-day and the need for automation is increasing. Object detection plays a significant role in detecting vital objects and layouts in document images and contributes to providing a clearer understanding of the documents. Nonetheless, previous research mainly focuses on English document images, and studies on Vietnamese document images are limited. In this study, we extensively benchmark state-of-the-art object detectors and analyze the performance of each method on Vietnamese document images. Moreover, we also investigate the effectiveness of four different loss functions on the experimental object detection methods. Extensive experiments on the UIT-DODV dataset are conducted to provide insightful discussions.

Keywords