Sentiment Classification of Crowdsourcing Participants&#x2019; Reviews Text Based on LDA Topic Model

Yanrong Huang; Rui Wang; Bin Huang; Bo Wei; Shu Li Zheng; Min Chen

doi:10.1109/ACCESS.2021.3101565

IEEE Access (Jan 2021)

Sentiment Classification of Crowdsourcing Participants’ Reviews Text Based on LDA Topic Model

Yanrong Huang,
Rui Wang,
Bin Huang,
Bo Wei,
Shu Li Zheng,
Min Chen

Affiliations

Yanrong Huang: ORCiD; College of Economics and Management, Zhejiang University of Water Resource and Electric Power, Hangzhou, China
Rui Wang: School of Economics and Management, Jiangxi University of Science and Technology, Ganzhou, China
Bin Huang: College of Economics and Management, Zhejiang University of Water Resource and Electric Power, Hangzhou, China
Bo Wei: ORCiD; School of Informatics Science and Technology, Zhejiang Sci-Tech University, Hangzhou, China
Shu Li Zheng: College of Economics and Management, Zhejiang University of Water Resource and Electric Power, Hangzhou, China
Min Chen: State Key Laboratory of Software Engineering, School of Computer Science, Wuhan University, Wuhan, China

DOI: https://doi.org/10.1109/ACCESS.2021.3101565
Journal volume & issue: Vol. 9
pp. 108131 – 108143

Abstract

Read online

The review text received by crowdsourcing participants contains valuable knowledge, opinions, and preferences, which is an important basis for employers to make trading decisions, and crowdsourcing participants to improve service level and quality. However, there are two kinds of emotional polarity in the review text, the attention paid to sentiment classification of review text with fuzzy emotional boundaries is insufficient. This paper proposes a supervised text sentiment classification method with Latent Dirichlet Allocation (LDA) to improve the classification performance of review text with fuzzy sentiment boundaries. Taking the review text of crowdsourcing participants on the Zhubajie platform as the data set, using N-gram, Word2vec, and TF-IDF algorithms to extract text features. The LDA topic model is applied to expand the number of text features and extract eight topics that affect employers’ sentiment tendencies. Text classifiers are constructed based on Support Vector Machine (SVM), Random Forest (RF), Gradient Boosting Decision Tree (GDBT), and Extreme Gradient Boosting (XGBoost) algorithms, and the effectiveness of the sentiment classification methods are verified by ten-fold cross-validation and confusion matrix. Experimental results show that using the LDA topic model to extend the features of review text can effectively alleviate the problem that the classifier is difficult to distinguish the sentiment categories of different emotion polarity words coexisting text, and enhance the ability of emotion boundary fuzzy text classification. Based on TF-IDF and LDA to extract and expand text features, the GBDT text sentiment classifier with the accuracy of 0.881; the F1-measure of the second, third, fourth, and fifth categories samples are 0.462, 0.571, 0.278, and 0.647 respectively, which is better than SVM, RF, and XGBoost classifiers and has the best classification performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords