Identifikasi Konten Kasar Pada  Tweet Bahasa Indonesia

Ahmad Fathan Hidayatullah; Aufa Aulia Fadila; Kiki Purnama Juwairi; Royan Abida Nayoan

doi:10.26418/jlk.v2i1.15

Jurnal Linguistik Komputasional (Mar 2019)

Identifikasi Konten Kasar Pada Tweet Bahasa Indonesia

Ahmad Fathan Hidayatullah,
Aufa Aulia Fadila,
Kiki Purnama Juwairi,
Royan Abida Nayoan

Affiliations

Ahmad Fathan Hidayatullah
Aufa Aulia Fadila
Kiki Purnama Juwairi
Royan Abida Nayoan

DOI: https://doi.org/10.26418/jlk.v2i1.15
Journal volume & issue: Vol. 2, no. 1
pp. 1 – 5

Abstract

Read online

This study aims to identify tweets containing abusive or offensive content. To do this, we performed five steps, such as, data collection, preprocessing, feature extraction, classification, and evaluation. We employed Multinomial Naïve Bayes and Support Vector Machine with linear kernel as our classification algorithm. Based on the experiment, it is known that the performance of the Support Vector Machine algorithm with linear kernel is superior overall compared to the Multinomial Naïve Bayes algorithm. It can be seen from the result of the values of accuracy, precision, recall, and F1-score for the SVM algorithm, respectively 0.9928; 0.9914; 0.9946; and 0.9930. Whereas the value of accuracy, precision, recall, and F1-score of the Multinomial Naïve Bayes algorithm are 0.9834; 0.9912; 0.9762; and 0.9836. However, it can be concluded that the Support Vector Machine and Multinomial Naïve Bayes algorithm have almost the same performance. This is evidenced by the difference in performance achievements that are not too striking from both algorithm.

Published in Jurnal Linguistik Komputasional

ISSN: 2621-9336 (Online)
Publisher: Indonesia Association of Computational Linguistics (INACL)
Country of publisher: Indonesia
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing
Website: http://inacl.id/journal/index.php/jlk

About the journal