Investigating Semantic Differences in User-Generated Content by Cross-Domain Sentiment Analysis Means

Traian-Radu Ploscă; Christian-Daniel Curiac; Daniel-Ioan Curiac

doi:10.3390/app14062421

Applied Sciences (Mar 2024)

Investigating Semantic Differences in User-Generated Content by Cross-Domain Sentiment Analysis Means

Traian-Radu Ploscă,
Christian-Daniel Curiac,
Daniel-Ioan Curiac

Affiliations

Traian-Radu Ploscă: Department of Automation and Applied Informatics, Politehnica University of Timisoara, V. Parvan 2, 300223 Timisoara, Romania
Christian-Daniel Curiac: Department of Computer and Information Technology, Politehnica University of Timisoara, V. Parvan 2, 300223 Timisoara, Romania
Daniel-Ioan Curiac: Department of Automation and Applied Informatics, Politehnica University of Timisoara, V. Parvan 2, 300223 Timisoara, Romania

DOI: https://doi.org/10.3390/app14062421
Journal volume & issue: Vol. 14, no. 6
p. 2421

Abstract

Read online

Sentiment analysis of domain-specific short messages (DSSMs) raises challenges due to their peculiar nature, which can often include field-specific terminology, jargon, and abbreviations. In this paper, we investigate the distinctive characteristics of user-generated content across multiple domains, with DSSMs serving as the central point. With cross-domain models on the rise, we examine the capability of the models to accurately interpret hidden meanings embedded in domain-specific terminology. For our investigation, we utilize three different community platform datasets: a Jira dataset for DSSMs as it contains particular vocabulary related to software engineering, a Twitter dataset for domain-independent short messages (DISMs) because it holds everyday speech type of language, and a Reddit dataset as an intermediary case. Through machine learning techniques, we thus explore whether software engineering short messages exhibit notable differences compared to regular messages. For this, we utilized the cross-domain knowledge transfer approach and RoBERTa sentiment analysis technique to prove the existence of efficient models in addressing DSSMs challenges across multiple domains. Our study reveals that DSSMs are semantically different from DISMs due to F1 score differences generated by the models.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords