IEEE Access (Jan 2018)

Capsules Based Chinese Word Segmentation for Ancient Chinese Medical Books

  • Si Li,
  • Mingzheng Li,
  • Yajing Xu,
  • Zuyi Bao,
  • Lu Fu,
  • Yan Zhu

DOI
https://doi.org/10.1109/ACCESS.2018.2881280
Journal volume & issue
Vol. 6
pp. 70874 – 70883

Abstract

Read online

Neural network models are popularly used in Chinese word segmentation task. The capsule architecture is proposed recently which has solved some defects of convolutional neural network. In this paper, we first introduce the capsule architecture to Chinese word segmentation. We utilize capsules as neural units. Before doing routing algorithm, we make a sliding capsule window to select the features which are extracted from the primary capsule layer. The sliding capsule window is proposed to adapt the capsule architecture to the sequence labeling task. The experiment results show that our proposed capsules based Chinese word segmentation model achieves competitive performances with the previous state-of-the-art methods. Ancient Chinese medical books record a lot of valuable experiences from the ancient medical workers. However, the research about the automatic text analysis on ancient Chinese medical documents is just a beginning. Due to the lack of the annotated data for Chinese medicine, we develop the word segmentation guideline for the ancient Chinese medical documents and select 10 genres, 30 ancient Chinese medical books to set up the annotation dataset. And with the annotated data, we develop the segmenter for the ancient Chinese medical text. Experiments show that the F1 measures of our model on the two datasets are 94.9% and 81.4% on Chinese Treebank6.0 and Ancient Chinese Medical Books, respectively.

Keywords