Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Feb 2023)

Covid-19 Fake News Detection on Twitter Based on Author Credibility Using Information Gain and KNN MethodsCovid-19 Fake News Detection on Twitter Based on Author Credibility Using Information Gain and KNN Methods

  • Nanda Ihwani Saputri,
  • Yuliant Sibaroni,
  • Sri Suryani Prasetiyowati

DOI
https://doi.org/10.29207/resti.v7i1.4871
Journal volume & issue
Vol. 7, no. 1
pp. 185 – 192

Abstract

Read online

Twitter is one of the social media that is used as a tool to share various kinds of information about various kinds of things that are of concern to social media users. One of the information shared is information about COVID-19, which is known that the COVID-19 pandemic is currently spreading throughout the world at a very alarming rate. COVID-19 is an infectious disease caused by SARS-COV-2. The World Health Organization (WHO) claims that the spread of COVID-19 is supported by the spread of false/fake news. So to find out the truth of the news, a COVID-19 fake news detector is needed so that users don't fall for the hoaxes circulating. This study aims to classify COVID-19 news on Twitter based on author credibility. Credibility in question is a person's perception of the validity of information and is a multidimensional concept that is used as a means of receiving information to assess the source of communication. The method used in this research is Information Gain and KNN. KNN (K-Nearest Neighbor) is a supervised learning algorithm that works by classifying a set of data based on classified training data. Information Gain is used to ranking the most influential attributes, and KNN is used to classify data based on learning data taken from the nearest neighbors. The research consists of 6 main stages, namely data collection (crawling data), data preprocessing, feature extraction, feature selection, data split into training data and testing data, KNN stage, and data evaluation stage. The research carried out succeeded in obtaining an accuracy value of 91%, a correlation value between credibility and hoax of 0.115, and a p-value <0.005.

Keywords