G-Tech (Jan 2024)

Comparative Analysis of NLP Techniques for Hate Speech Classification in Online Communications

  • Gregorius Airlangga

Journal volume & issue
Vol. 8, no. 1

Abstract

Read online

This research aimed to compare the effectiveness of two Natural Language Processing (NLP) techniques—SpaCy's word embeddings and Sklearn's TF-IDF vectorization—in identifying hate speech within online comments. Utilizing a balanced dataset, each model was meticulously assessed on its ability to classify comments as 'hateful' or 'non-hateful'. The evaluation metrics employed were precision, recall, F1-score, and overall accuracy. The model using SpaCy's word embeddings achieved an accuracy of 65%, with equal precision and recall for both classes. The Sklearn's TF-IDF vectorization model, however, demonstrated superior performance with an overall accuracy of 75% and an enhanced ability to correctly identify hateful comments, evidenced by a 77% recall rate. This suggests that the TF-IDF model is more adept at discerning nuanced expressions of hate speech. The study's findings highlight the critical role of vectorization methods in the field of automated content moderation and stress the importance of continued innovation and model adaptation to effectively manage the evolving nature of online hate speech.

Keywords