IEEE Access (Jan 2020)

A Hybrid Classification Method via Character Embedding in Chinese Short Text With Few Words

  • Yi Zhu,
  • Yun Li,
  • Yongzheng Yue,
  • Jipeng Qiang,
  • Yunhao Yuan

DOI
https://doi.org/10.1109/ACCESS.2020.2994450
Journal volume & issue
Vol. 8
pp. 92120 – 92128

Abstract

Read online

Last decades have witnessed the significance development of research in short text classification. However, most existing methods only focus on the text which contained dozens of words like Twitter or MicroBlog, but not take the short text with few words like news headline or invoice name into consideration. Meanwhile, contemporary short text classification methods either to expand feature of short text with external corpus or to learn the feature representation from all the texts, which have not take the difference between words of short text into full consideration. Notably, the classification of short text with few words are usually determined by a few specific key words contrary to documents classification or traditional short text classification. To address these problems, this paper propose a hybrid classification method of Attention mechanism and Feature selection via Character embedding in Chinese short text with few words, called AFC. More specifically, firstly, the character embedding is computed to represent Chinese short texts with few words, which takes full advantage of short text information without external corpus. Secondly, attention-based LSTM is introduced in our method to project the data into feature representation space with weighting, which make the keywords in classification have more subtle value. Furthermore, the semantic similarity between content and class label information is calculated for feature selection, which reduces the possible negative influence of some redundant information on classification. Experiments on real-world datasets demonstrate the effectiveness of our method compared to other competing methods.

Keywords