Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning

Antonius Rachmat Chrismanto; Anny Kartika Sari; Yohanes Suyanto

doi:10.1109/ACCESS.2023.3299853

IEEE Access (Jan 2023)

Enhancing Spam Comment Detection on Social Media With Emoji Feature and Post-Comment Pairs Approach Using Ensemble Methods of Machine Learning

Antonius Rachmat Chrismanto,
Anny Kartika Sari,
Yohanes Suyanto

Affiliations

Antonius Rachmat Chrismanto: ORCiD; Department of Computer Sciences and Electronics, Universitas Gadjah Mada, Yogyakarta, Indonesia
Anny Kartika Sari: ORCiD; Department of Computer Sciences and Electronics, Universitas Gadjah Mada, Yogyakarta, Indonesia
Yohanes Suyanto: ORCiD; Department of Computer Sciences and Electronics, Universitas Gadjah Mada, Yogyakarta, Indonesia

DOI: https://doi.org/10.1109/ACCESS.2023.3299853
Journal volume & issue: Vol. 11
pp. 80246 – 80265

Abstract

Read online

Every time a well-known public figure posts something on social media, it encourages many users to comment. Unfortunately, not all comments are relevant to the post. Some are spam comments which can disrupt the overall flow of information. This research employed two strategies to address issues in text spam detection on social media. The first strategy was utilizing emojis that had been frequently discarded in many studies. In fact, many social media users use emojis to convey their intentions. The second strategy was utilizing stacked post-comment pairs, which was different from many spam detection systems that solely focused on comment-only data. The post-comment pairs were required to detect whether a comment was relevant (not spam) or spam irrelevant to the post context. This research used the SpamID-Pair dataset derived from social media for Indonesian spam comment detection. After a comprehensive investigation, the emoji-text feature, the stacked post-comment pairs, and ensemble voting could boost detection performance (in terms of accuracy and F1). Adding manual features also improved detection performance. Based on the experiment, the best stand-alone methods for spam comment detection are the SVM (RBF kernel) and the soft voting ensemble method for the best average performance.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords