Jurnal Linguistik Komputasional (Mar 2019)
Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia
Abstract
This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.