Jisuanji kexue (Apr 2022)

Chinese Short Text Classification Algorithm Based on Hybrid Features of Characters and Words

  • LIU Shuo, WANG Geng-run, PENG Jian-hua, LI Ke

DOI
https://doi.org/10.11896/jsjkx.210200027
Journal volume & issue
Vol. 49, no. 4
pp. 282 – 287

Abstract

Read online

The rapid development of information technology has lead to massive data of Chinese short texts on the Internet.As such, using classification technology to dig out valuable information from it is a current research hotspot.Compared with Chinese long texts, short texts have the characteristics of fewer words, more ambiguities and irregular information, making text feature extraction and expression a challenge.For this reason, a Chinese short text classification algorithm based on the deep neural network model of hybrid features of characters and words is proposed.First, the character vector and word vector of Chinese short text are calculated respectively.Then, their features are extracted and fused.Last, the classification task is accomplished through the fully connected layer and the softmax layer.The test results on the public THUCNews news data set show that the algorithm is better than the mainstream TextCNN, BiGRU, Bert and ERNIE_BiGRU comparison models in terms of accuracy, recall and F1 value.It has a good effect on short text classification.

Keywords