Electronics (Jun 2023)

Short Text Classification Based on Hierarchical Heterogeneous Graph and LDA Fusion

  • Xinlan Xu,
  • Bo Li,
  • Yuhao Shen,
  • Bing Luo,
  • Chao Zhang,
  • Fei Hao

DOI
https://doi.org/10.3390/electronics12122560
Journal volume & issue
Vol. 12, no. 12
p. 2560

Abstract

Read online

The proliferation of short texts resulting from the rapid advancements of social networks, online communication, and e-commerce has created a pressing need for short text classification in various applications. This paper presents a novel approach for short text classification, which combines a hierarchical heterogeneous graph with latent Dirichlet allocation (LDA) fusion. Our method first models the short text dataset as a hierarchical heterogeneous graph, which incorporates more syntactic and semantic information through a word graph, parts-of-speech (POS) tag graph, and entity graph. We then connected the representation of these three feature maps to derive a comprehensive feature vector for the text. Finally, we used the LDA topic model to adjust the feature weight, enhancing the effectiveness of short text extension. Our experiments demonstrated that our proposed approach has a promising performance in English short text classification, while in Chinese short text classification, although slightly inferior to the LDA + TF-IDF method, it still achieved promising results.

Keywords