IJAIN (International Journal of Advances in Intelligent Informatics) (Aug 2024)

Comparative study of predictive models for hoax and disinformation detection in indonesian news

  • Nadia Paramita Retno Adiati,
  • Dimas Febriyan Priambodo,
  • Girinoto Girinoto,
  • Santi Indarjani,
  • Akhmad Rizal,
  • Arga Prayoga,
  • Yehezikha Beatrix

DOI
https://doi.org/10.26555/ijain.v10i3.878
Journal volume & issue
Vol. 10, no. 3
pp. 504 – 516

Abstract

Read online

Along with the times, false information easily spreads, including in Indonesia. In Press Release No.485/HM/KOMINFO/12/2021 the Ministry of Communication and Information has cut off access to 565,449 negative content and published 1,773 clarifications on hoax and disinformation content. Research has been carried out regarding this matter, but it is necessary to classify fake news into disinformation and hoaxes. This study presents a comparison between our proposed model, which is an ensemble of shallow learning predictive models, namely Random Forest, Passive Aggressive Classifier, and Cosine Similarity, and the deep learning model that uses BERT-Indo for classification. Both models are trained using equivalent datasets, which contain 8757 news, consisting of 3000 valid news, 3000 hoax news, and 2757 disinformation news. These news were obtained from websites such as CNN, Kompas, Detik, Kominfo, Temanggung Mediacenter, Hoaxdb Aceh, Turnback Hoax, and Antara, which were then cleaned from all unnecessary substances, such as punctuation marks, numbers, Unicode, stopwords, and suffixes using the Sastrawi library. At the benchmarking stage, the shallow learning model is evaluated to increase accuracy by applying ensemble learning combined using hard voting. This results in higher values, with an accuracy of 98.125%, precision of 98.2%, F-1 score of 98.1%, and recall of 98.1%, compared to the BERT-Indo model which only achieved 96.918% accuracy, 96.069% precision, 96.937% F-1 score, and 96.882% recall. Based on the accuracy value, shallow learning model is superior to deep learning model. This machine learning model is expected to be used to combat the spread of hoaxes and disinformation in Indonesian news. Additionally, with this research, false news can be classified in more detail, both as hoaxes and disinformation

Keywords