Spam Comments Detection on Instagram Using Machine Learning and Deep Learning Methods

Antonius Rachmat Chrismanto; Afiahayati Afiahayati; Yunita Sari; Anny Kartika Sari; Yohanes Suyanto

doi:10.24843/LKJITI.2022.v13.i01.p05

Lontar Komputer (Aug 2022)

Spam Comments Detection on Instagram Using Machine Learning and Deep Learning Methods

Antonius Rachmat Chrismanto,
Afiahayati Afiahayati,
Yunita Sari,
Anny Kartika Sari,
Yohanes Suyanto

Affiliations

Antonius Rachmat Chrismanto: Universitas Kristen Duta Wacana
Afiahayati Afiahayati: Universitas Gadjah Mada Yogyakarta
Yunita Sari: Universitas Gadjah Mada Yogyakarta
Anny Kartika Sari: Universitas Gadjah Mada Yogyakarta
Yohanes Suyanto: Universitas Gadjah Mada Yogyakarta

DOI: https://doi.org/10.24843/LKJITI.2022.v13.i01.p05
Journal volume & issue: Vol. 13, no. 1
pp. 46 – 59

Abstract

Read online

The more popular a public figure on Instagram (IG), the number of followers also increase. When a public figure posts something, there are many comments from other users. In fact, from all the comments, not all of them are relevant to the post, such as advertising, links, or clickbait comments. The type of comments that are irrelevant to the post is usually called spam comments. Spam comments will interfere with information flow and may lead to misleading information. This research compares machine learning (ML) and deep learning (DL) classification methods based on our collected Indonesian IG spam comment dataset. This research was conducted in the following steps: dataset preparation, pre-processing, simple normalization, features generation using TF-IDF and word embedding, application of ML and DL classification methods, performance evaluation, and comparison. The authors compare accuracy, F-1, precision, and recall from ML and DL results. This research shows that ML and DL methods do not significantly differ. The Linear SVM, Extreme Tree (ET), Regression, and Stochastics Gradient Descent algorithms can reach the accuracy of 0.93. At the same time, the DL method has the highest accuracy of 0.94 using the SimpleTransformer BERT architecture. The difference between ML and DL methods is not significantly different.

Published in Lontar Komputer

ISSN: 2088-1541 (Print); 2541-5832 (Online)
Publisher: Udayana University, Institute for Research and Community Services
Country of publisher: Indonesia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ojs.unud.ac.id/index.php/lontar

About the journal