Applied Sciences (Mar 2024)
An Unsupervised Character Recognition Method for Tibetan Historical Document Images Based on Deep Learning
Abstract
As there is a lack of public mark samples of Tibetan historical document image characters at present, this paper proposes an unsupervised Tibetan historical document character recognition method based on deep learning (UD-CNN). Firstly, using the Tibetan historical document character component, the Tibetan historical document character sample data set is constructed for model-aided training. Then, the character baseline information is introduced, and a fine-grained feature learning strategy is proposed. For the samples above and below the baseline, the Up-CNN recognition model and Down-CNN recognition model are established. The convolution neural network model is trained and optimized for the samples above and below the baseline, respectively, to improve the recognition accuracy. The experimental results show that the proposed method obviously affects the unmarked character classification and recognition of real Tibetan historical document images. The recognition rate of Top5 can reach 92.94%, and the recognition rate of Top1 can be increased from 82.25% to 87.27% using the CNN model only.
Keywords