Jurnal Linguistik Komputasional (Mar 2019)

Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia

  • Ahmad Fathan Hidayatullah,
  • Aufa Aulia Fadila,
  • Kiki Purnama Juwairi,
  • Royan Abida Nayoan

DOI
https://doi.org/10.26418/jlk.v2i1.15
Journal volume & issue
Vol. 2, no. 1
pp. 1 – 5

Abstract

Read online

This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values ​​of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.