ETLE Sentiment Analysis Performance Increasement with TF-IDF, MDI Feature Selection, and SVM

Muhammad Syiarul Amrullah; Aji Gautama Putrada; Mohamad Nurkamal Fauzan; Nur Alamsyah

doi:10.32520/stmsi.v13i4.2701

Sistemasi: Jurnal Sistem Informasi (Jul 2024)

ETLE Sentiment Analysis Performance Increasement with TF-IDF, MDI Feature Selection, and SVM

Muhammad Syiarul Amrullah,
Aji Gautama Putrada,
Mohamad Nurkamal Fauzan,
Nur Alamsyah

Affiliations

Muhammad Syiarul Amrullah: Universitas Logistik dan Bisnis International
Aji Gautama Putrada: Advanced and Creative Networks Research Center, Telkom University
Mohamad Nurkamal Fauzan: Advanced and Creative Networks Research Center, Telkom University
Nur Alamsyah: Advanced and Creative Networks Research Center, Telkom University

DOI: https://doi.org/10.32520/stmsi.v13i4.2701
Journal volume & issue: Vol. 13, no. 4
pp. 1308 – 1318

Abstract

Read online

In Indonesia, the government, through the Indonesian National Police (POLRI), has just released a new regulation, the Electronic Traffic Law Enforcement (ETLE). A traffic ticket policy is carried out electronically through camera monitoring connected directly to the vehicle registration certificates (STNK) database. The government can measure people's likes or dislikes of these public policies through sentiment analysis. There have been studies that have applied sentiment analysis to find out people's responses to ETLE. However, in terms of performance, this model only has an accuracy of 0.42. This study proposes the use of a support vector machine (SVM), term frequency-inversed document frequency (TF-IDF), and mean decrease in impurity (MDI) to evaluate polarization sentiment analysis on ETLE policies. First, we retrieve tweets about ETLE from Twitter. Then we do text analysis pre-processing and the remove stop words process. The next step is to carry out the TF-IDF process. We apply two feature selection methods for our comparison: MDI and recurrent feature elimination (RFE). Next, we compare two classification models, namely naïve Bayes and SVM. Some of the metrics that we use to evaluate the pre-processing stage are the probability density function (PDF) and the t-test. We use the bag of words (BoW) to evaluate the remove stop words stage. Finally, sensitivity, specificity, and the receiver operating curve (ROC) are for evaluating feature selection methods and classification methods. The test results show that TF-IDF produces 1,022 new features. The combination of the methods we used resulted in the six models we compared. SVM+TF-IDF+MDI is the model with the best performance compared to the other five models. Accuracy and area under curve (AUC) scores are 0.99 and 0.97, respectively.

Published in Sistemasi: Jurnal Sistem Informasi

ISSN: 2302-8149 (Print); 2540-9719 (Online)
Publisher: Islamic University of Indragiri
Country of publisher: Indonesia
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://sistemasi.ftik.unisi.ac.id/index.php/stmsi

About the journal