IEEE Access (Jan 2019)
Domain-Specific Chinese Word Segmentation Based on Bi-Directional Long-Short Term Memory Model
Abstract
Most of the current word segmentation methods are rule-based and traditional machine learning methods. Universal word segmentation tools do not work well in the field such as metallurgy. Domain-specific Chinese word segmentation is rarely studied. In recent years, with the development of deep learning, the neural network has been proved to be effective in Chinese word segmentation. However, this promising performance relies on large-scale training data. Neural networks with conventional architectures cannot achieve the desired results in low-resource datasets due to the lack of labeled training data. This paper takes the field of metallurgy as an example and proposes a domain-specific Chinese word segmentation based on Bi-directional long-short term memory (Bi-directional LSTM) model in the metallurgical field. First, the word segmentation model is obtained by using the Bi-directional LSTM model to train the internal and external domain knowledge. Then, a series of tuning parameters are carried out and the label probability of the word is combined with the weight. Finally, the result of word segmentation is obtained by label inference layer. The experimental results show that the proposed method can create a better word segmentation effect in the field of metallurgy.
Keywords