Applied Sciences (May 2022)

Character Recognition in Endangered Archives: Shui Manuscripts Dataset, Detection and Application Realization

  • Minli Tang,
  • Shaomin Xie,
  • Mu He,
  • Xiangrong Liu

DOI
https://doi.org/10.3390/app12115361
Journal volume & issue
Vol. 12, no. 11
p. 5361

Abstract

Read online

Shui manuscripts provide a historical testimony of the national identity and spirit of the Shui people. In response to the lack of a high-quality Shui manuscripts dataset, we collected Shui manuscript images in the Shui area and used various methods to enhance them. Through our efforts, we created a well-labeled and sizable Shui manuscripts dataset, named Shuishu_T, which is the largest of its kind. Then, we applied target detection technology for Shui manuscript characters recognition. Specifically, we compared the advantages and disadvantages of Faster R-CNN, you only look once (YOLO), and single shot multibox detector (SSD), and subsequently chose Faster R-CNN to detect and recognize Shui manuscript characters. We trained and tested 111 classes of Shui manuscript characters with Faster R-CNN and achieved an average recognition rate of 87.8%. Finally, we designed a WeChat applet that can be used to quickly identify Shui manuscript characters in images obtained by scanning Shui manuscripts with a mobile phone. This work provides a basis for realizing the recognition of characters in Shui manuscripts on mobile terminals. Our research enables the intangible cultural heritage of the Shui people to be preserved, promoted, and shared, which is of great significance for the conservation and inheritance of Shui manuscripts.

Keywords