Lontar Komputer (Dec 2023)

The Influence Of Applying Stopword Removal And Smote On Indonesian Sentiment Classification

  • Arif Bijaksana Putra Negara

DOI
https://doi.org/10.24843/LKJITI.2023.v14.i03.p05
Journal volume & issue
Vol. 14, no. 3
pp. 172 – 185

Abstract

Read online

Information, like public opinions or responses, can be obtained through Twitter tweets. These opinions can expressed as a sentiment. Sentiments can be positive, neutral, or negative. Sentiment analysis (opinion mining) on a text can performed through text classification. This research aims to determine the influence of implementing Stopword Removal and SMOTE on the sentiment classification model for Indonesian tweets. The algorithms used in this research are Logistic Regression and Random Forest. Based on the evaluation, the best classification model in this research was achieved by implementing the Random Forest algorithm along with SMOTE, with an f1-score value of 75.03%. Meanwhile, implementing the Random Forest algorithm and Stopword Removal achieved the worst classification model, with an f1-score value of 68.09%. Implementing Stopword Removal in both algorithms has a negative impact in the form of a decrease in the resulting f1-score. Meanwhile, the performance of SMOTE provides a positive impact in the form of an increase in the resulting f1-score. This happened since Stopword Removal could reduce information and alter the meaning of processed tweets, causing the tweet to lose its sentiment.