Sistemasi: Jurnal Sistem Informasi (Sep 2024)

Feature Extraction Optimization to Improve Naïve Bayes Accuracy in Sentiment Analysis of Bulukumba Tourism Objects

  • Darmawan Setiawan,
  • Najirah Umar,
  • M. Adnan Nur

DOI
https://doi.org/10.32520/stmsi.v13i5.4580
Journal volume & issue
Vol. 13, no. 5
pp. 2209 – 2221

Abstract

Read online

This research employs social media (Twitter) to apply sentiment analysis ascertain the degree of public satisfaction with the Bulukumba tourist attraction. Unstructured text data is a major challenge in sentiment analysis. For this reason, implementing the Naïve Bayes algorithm is an effective approach for conquering this challenge because of its ability to handle text data well. This study aims to evaluate the performance of multinomial Naïve Bayes by testing a combination of minimum document frequency (min-df) and maximum document frequency (max-df) parameter values in determining the level of accuracy. This analysis stage includes collecting data from Twitter related to the Bulukumba tourist attraction. Preprocessing carried out includes data cleaning, casefolding, text normalization, tokenization, stopword removal, and stemming. Feature extraction using Count Vectorizer and TF-IDF weighting. The process ends with 10-Fold Cross-Validation by separating the data into training data and test data for sentiment analysis classification, as well as evaluation using the Confusion Matrix. In this research, there are 10 test scenarios with various combinations of min-df and max-df. The values of employed min-df consists of 0.001, 0.002, 0.005, 0.01, 0.02 and max-df consists of 0.5 and 0.8. The results of implementing Multinomial Naïve Bayes in this test show that classification accuracy increases with effective min-df and max-df parameter settings. The greatest accuracy was 0.7910 in testing a combination of min-df parameter values of 0.001 and max-df 0.8. Meanwhile, the average accuracy for each test was obtained the highest value of 0.7272 with min-df of 0.002 and max-df of 0.5 and 0.8 respectively.