Analysis of K-NN with the Integration of Bag of Words, TF-IDF, and N-Grams for Hate Speech Classification on Twitter

Kuncoro Hadi; Ema Utami

doi:10.30595/juita.v12i2.23829

Jurnal Informatika (Nov 2024)

Analysis of K-NN with the Integration of Bag of Words, TF-IDF, and N-Grams for Hate Speech Classification on Twitter

Kuncoro Hadi,
Ema Utami

Affiliations

Kuncoro Hadi: Universitas Amikom Yogyakarta
Ema Utami: Universitas Amikom Yogyakarta

DOI: https://doi.org/10.30595/juita.v12i2.23829
Journal volume & issue: Vol. 12, no. 2
pp. 289 – 298

Abstract

Read online

Social media has emerged as one of the primary communication channels in the modern world, but it has simultaneously become a platform where hate speech can spread easily. This study attempts to evaluate the performance of a hate speech classification model using the K-Nearest Neighbors (K-NN) algorithm along with various feature extraction techniques, specifically Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and N-Grams. The dataset used in this study consists of 13169 entries, which represent a diverse range of hate speech examples commonly encountered on social media platforms. In this experimental investigation, we assess the efficacy of the model using each feature extraction technique. The findings reveal that the K-NN model exhibits optimal performance when the k parameter is set to 3 (k=3). Under this configuration, the model achieves an accuracy of 86.88%, with a precision of 88.27%, a recall of 86.88%, and an F1-Score of 86.50%. These results show that the integration of TF-IDF feature extraction technique with K-NN algorithm produces superior performance in hate speech classification.

hate speech, k-nearest neighbors, bag of words, tf-idf, n-grams, f1 score

Published in Jurnal Informatika

ISSN: 2086-9398 (Print); 2579-8901 (Online)
Publisher: Universitas Muhammadiyah Purwokerto
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jurnalnasional.ump.ac.id/index.php/JUITA/

About the journal

Abstract

Keywords