DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images

Journal of Imaging. 2017;3(4):62 DOI 10.3390/jimaging3040062

 

Journal Homepage

Journal Title: Journal of Imaging

ISSN: 2313-433X (Online)

Publisher: MDPI AG

LCC Subject Category: Technology: Photography | Medicine: Medicine (General): Computer applications to medicine. Medical informatics | Science: Mathematics: Instruments and machines: Electronic computers. Computer science

Country of publisher: Switzerland

Language of fulltext: English

Full-text formats available: PDF, HTML, XML

 

AUTHORS

Nicholas Journet (Laboratoire Bordelais de Recherche en Informatique UMR 5800, Université de Bordeaux, CNRS, Bordeaux INP, 33400 Talence, France)
Muriel Visani (Laboratoire Informatique, Image et Interaction (L3i), Université de La Rochelle, 17000 La Rochelle, France)
Boris Mansencal (Laboratoire Bordelais de Recherche en Informatique UMR 5800, Université de Bordeaux, CNRS, Bordeaux INP, 33400 Talence, France)
Kieu Van-Cuong (LIPADE Laboratory, Paris Descartes University, 45, rue des Saints-Pères, 75270 Paris, CEDEX 6, France)
Antoine Billy (Laboratoire Bordelais de Recherche en Informatique UMR 5800, Université de Bordeaux, CNRS, Bordeaux INP, 33400 Talence, France)

EDITORIAL INFORMATION

Blind peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 7 weeks

 

Abstract | Full Text

Most digital libraries that provide user-friendly interfaces, enabling quick and intuitive access to their resources, are based on Document Image Analysis and Recognition (DIAR) methods. Such DIAR methods need ground-truthed document images to be evaluated/compared and, in some cases, trained. Especially with the advent of deep learning-based approaches, the required size of annotated document datasets seems to be ever-growing. Manually annotating real documents has many drawbacks, which often leads to small reliably annotated datasets. In order to circumvent those drawbacks and enable the generation of massive ground-truthed data with high variability, we present DocCreator, a multi-platform and open-source software able to create many synthetic image documents with controlled ground truth. DocCreator has been used in various experiments, showing the interest of using such synthetic images to enrich the training stage of DIAR tools.