MATEC Web of Conferences (Jan 2019)
An Algorithm Rapidly Segmenting Chinese Sentences into Individual Words
Abstract
This paper proposes an improved Trie tree structure. The tree node records the position information of the characters participating in the word formation, and the child node uses the hash search mechanism. On this basis, the forward maximum matching algorithm of Chinese word segmentation is optimized. In the process of word segmentation, the automaton mechanism is used to judge whether it constitutes the longest word, and the problem that the forward maximum matching algorithm needs to adjust the string according to the word length is solved. The algorithm time complexity is 1.33, and the comparison test results show that there is a fast word segmentation speed. The forward maximum matching algorithm based on the improved Trie tree structure improves the Chinese word segmentation speed, especially when the dictionary structure needs to be updated in real time.
Keywords