Journal of King Saud University: Computer and Information Sciences (Mar 2024)

Punctuation and lexicon aid representation: A hybrid model for short text sentiment analysis on social media platform

  • Zhenyu Li,
  • Zongfeng Zou

Journal volume & issue
Vol. 36, no. 3
p. 102010

Abstract

Read online

Sentiment analysis measures user experience on social media. With the emergence of pre-trained models, text classification tasks have become homogeneous, without a significant improvement in accuracy. Therefore, we present a hybrid model called PLASA to classify the sentiment polarity of short comments, particularly barrages. PLASA introduces a collaborative attention module that integrates information about relative position and knowledge from summarized lexicons to better adjust the relationship between word representations. Our model is evaluated using three new curated sentiment analysis datasets: SentiTikTok-2023 (4613 items), SentiBilibili-2023 (7755 items), and SentiWeibo-2023 (5614 items). Although the comment length varies across datasets, all maintain a consistent punctuation percentage at approximately 13%. Consequently, PLASA with the optimal combination demonstrates notable performance improvements compared to both the baseline and commonly used models. It achieves micro-F1 scores of 93.94%, 90.34%, and 88.79% on the respective datasets. We also observed that the representation capacity of the pre-trained model decreases as the text length increases. Moreover, the proposed collaborative attention module effectively addresses this limitation, as confirmed by our ablation study.

Keywords