Ho Chi Minh City Open University Journal of Science - Engineering and Technology (Apr 2022)

Khmer printed character recognition using attention-based Seq2Seq network

  • Rina Buoy,
  • Nguonly Taing,
  • Sovisal Chenda,
  • Sokchea Kor

DOI
https://doi.org/10.46223/HCMCOUJS.tech.en.12.1.2217.2022
Journal volume & issue
Vol. 12, no. 1
pp. 3 – 16

Abstract

Read online

This paper presents an end-to-end deep convolutional recurrent neural network solution for Khmer optical character recognition (OCR) task. The proposed solution uses a sequence-to-sequence (Seq2Seq) architecture with attention mechanism. The encoder extracts visual features from an input text-line image via layers of convolutional blocks and a layer of gated recurrent units (GRU). The features are encoded in a single context vector and a sequence of hidden states which are fed to the decoder for decoding one character at a time until a special end-of-sentence (EOS) token is reached. The attention mechanism allows the decoder network to adaptively select relevant parts of the input image while predicting a target character. The Seq2Seq Khmer OCR network is trained on a large collection of computer-generated text-line images for multiple common Khmer fonts. Complex data augmentation is applied on both train and validation dataset. The proposed model’s performance outperforms the state-of-art Tesseract OCR engine for Khmer language on the validation set of 6400 augmented images by achieving a character error rate (CER) of 0.7% vs 35.9%.

Keywords