Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia

Made Windu Antara Kesiman; Dona Valy; Jean-Christophe Burie; Erick Paulus; Mira Suryani; Setiawan Hadi; Michel Verleysen; Sophea Chhun; Jean-Marc Ogier

doi:10.3390/jimaging4020043

Journal of Imaging (Feb 2018)

Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia

Made Windu Antara Kesiman,
Dona Valy,
Jean-Christophe Burie,
Erick Paulus,
Mira Suryani,
Setiawan Hadi,
Michel Verleysen,
Sophea Chhun,
Jean-Marc Ogier

Affiliations

Made Windu Antara Kesiman: Laboratoire Informatique Image Interaction (L3i), Université de La Rochelle, 17042 La Rochelle, France
Dona Valy: Institute of Information and Communication Technologies, Electronic, and Applied Mathematics (ICTEAM), Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium
Jean-Christophe Burie: Laboratoire Informatique Image Interaction (L3i), Université de La Rochelle, 17042 La Rochelle, France
Erick Paulus: Department of Computer Science, Universitas Padjadjaran, Bandung 45363, Indonesia
Mira Suryani: Department of Computer Science, Universitas Padjadjaran, Bandung 45363, Indonesia
Setiawan Hadi: Department of Computer Science, Universitas Padjadjaran, Bandung 45363, Indonesia
Michel Verleysen: Laboratory of Cultural Informatics (LCI), Universitas Pendidikan Ganesha, Singaraja, Bali 81116, Indonesia
Sophea Chhun: Department of Information and Communication Engineering, Institute of Technology of Cambodia, Phnom Penh, Cambodia
Jean-Marc Ogier: Laboratoire Informatique Image Interaction (L3i), Université de La Rochelle, 17042 La Rochelle, France

DOI: https://doi.org/10.3390/jimaging4020043
Journal volume & issue: Vol. 4, no. 2
p. 43

Abstract

Read online

This paper presents a comprehensive test of the principal tasks in document image analysis (DIA), starting with binarization, text line segmentation, and isolated character/glyph recognition, and continuing on to word recognition and transliteration for a new and challenging collection of palm leaf manuscripts from Southeast Asia. This research presents and is performed on a complete dataset collection of Southeast Asian palm leaf manuscripts. It contains three different scripts: Khmer script from Cambodia, and Balinese script and Sundanese script from Indonesia. The binarization task is evaluated on many methods up to the latest in some binarization competitions. The seam carving method is evaluated for the text line segmentation task, compared to a recently new text line segmentation method for palm leaf manuscripts. For the isolated character/glyph recognition task, the evaluation is reported from the handcrafted feature extraction method, the neural network with unsupervised learning feature, and the Convolutional Neural Network (CNN) based method. Finally, the Recurrent Neural Network-Long Short-Term Memory (RNN-LSTM) based method is used to analyze the word recognition and transliteration task for the palm leaf manuscripts. The results from all experiments provide the latest findings and a quantitative benchmark for palm leaf manuscripts analysis for researchers in the DIA community.

Published in Journal of Imaging

ISSN: 2313-433X (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Photography; Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.mdpi.com/journal/jimaging

About the journal

Abstract

Keywords