Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent Method

Felipe Viegas; Sergio Canuto; Washington Cunha; Celso França; Claudio Valiense; Guilherme Fonseca; Ana Machado; Leonardo Rocha; Marcos André Gonçalves

doi:10.5753/jis.2024.4117

Journal on Interactive Systems (Jun 2024)

Pipelining Semantic Expansion and Noise Filtering for Sentiment Analysis of Short Documents – CluSent Method

Felipe Viegas,
Sergio Canuto,
Washington Cunha,
Celso França,
Claudio Valiense,
Guilherme Fonseca,
Ana Machado,
Leonardo Rocha,
Marcos André Gonçalves

Affiliations

Felipe Viegas: Universidade Federal de Minas Gerais
Sergio Canuto: Instituto Federal de Goiás
Washington Cunha: Universidade Federal de Minas Gerais
Celso França: Universidade Federal de Minas Gerais
Claudio Valiense: Universidade Federal de Minas Gerais
Guilherme Fonseca: Universidade Federal de São João del-Rei
Ana Machado: Universidade Federal de São João del-Rei
Leonardo Rocha: Universidade Federal de São João del-Rei
Marcos André Gonçalves: Universidade Federal de Minas Gerais

DOI: https://doi.org/10.5753/jis.2024.4117
Journal volume & issue: Vol. 15, no. 1

Abstract

Read online

The challenge of constructing effective sentiment models is exacerbated by a lack of sufficient information, particularly in short texts. Enhancing short texts with semantic relationships becomes crucial for capturing affective nuances and improving model efficacy, albeit with the potential drawback of introducing noise. This article introduces a novel approach, CluSent, designed for customized dataset-oriented sentiment analysis. CluSent capitalizes on the CluWords concept, a proposed powerful representation of semantically related words. To address the issues of information scarcity and noise, CluSent addresses these challenges: (i) leveraging the semantic neighborhood of pre-trained word embedding representations to enrich document representation and (ii) introducing dataset-specific filtering and weighting mechanisms to manage noise. These mechanisms utilize part-of-speech and polarity/intensity information from lexicons. In an extensive experimental evaluation spanning 19 datasets and five state-of-the-art baselines, including modern transformer architectures, CluSent emerged as the superior method in the majority of scenarios (28 out of 38 possibilities), demonstrating noteworthy performance gains of up to 14% over the strongest baselines.

Published in Journal on Interactive Systems

ISSN: 2763-7719 (Online)
Publisher: Brazilian Computer Society
Country of publisher: Brazil
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://sol.sbc.org.br/journals/index.php/jis/

About the journal

Abstract

Keywords