CAAI Transactions on Intelligence Technology (Dec 2022)

The early Japanese books reorganization by combining image processing and deep learning

  • Bing Lyu,
  • Hengyi Li,
  • Ami Tanaka,
  • Lin Meng

DOI
https://doi.org/10.1049/cit2.12104
Journal volume & issue
Vol. 7, no. 4
pp. 627 – 643

Abstract

Read online

Abstract Many early Japanese books record a large amount of information, including historical politics, economics, culture, and so on, which are all valuable legacies. These books are waiting to be reorganized at the moment. However, a large amount of the books are described by Kuzushiji, a type of handwriting cursive script that is no longer in use today and only readable by a few experts. Therefore, researchers are trying to detect and recognise the characters from these books through modern techniques. Unfortunately, the characteristics of the Kuzushiji, such as Connect‐Separate‐characters and Many‐variation, hinder the modern technique assisted re‐organisation. Connect‐Separate‐characters refer to the case of some characters connecting each other or one character being separated into unconnected parts, which makes character detection hard. Many‐variation is one of the typical characteristics of Kuzushiji, defined as the case that the same character has several variations even if they are written by the same person in the same book at the same time, which increases the difficulty of character recognition. In this sense, this paper aims to construct an early Japanese book reorganisation system by combining image processing and deep learning techniques. The experimentation has been done by testing two early Japanese books. In terms of character detection, the final Recall, Precision and F‐value reaches 79.8%, 80.3%, and 80.0%, respectively. The deep learning based character recognition accuracy of Top3 reaches 69.52%, and the highest recognition rate reaches 82.57%, which verifies the effectiveness of our proposal.

Keywords