IEEE Access (Jan 2023)

Generating Fake News Detection Model Using A Two-Stage Evolutionary Approach

  • Jeffery T. H. Kong,
  • W. K. Wong,
  • Filbert H. Juwono,
  • Catur Apriono

DOI
https://doi.org/10.1109/ACCESS.2023.3303321
Journal volume & issue
Vol. 11
pp. 85067 – 85085

Abstract

Read online

While fake news is morally reprehensible, irresponsible parties intentionally use it to achieve their goals by disseminating it to vulnerable and targeted groups. Machine learning techniques have been researched extensively to detect fake news. On the other hand, evolutionary-based algorithms are now gaining popularity in the research community. In this study, a two-stage evolutionary approach is proposed to generate and optimize a mathematical equation for fake news detection. In the first stage, tree-based Genetic Programming (GP) algorithm is used to generate mathematical expressions to detect correlations between the language-independent (Lang-IND) features, extracted from Fake.my-COVID19 dataset, the newly curated fake news dataset in a mixed Malay - English language. The uniqueness of the proposed approach is that the mathematical expressions are formed by basic arithmetic operators or to include complex arithmetic operators such as addition, multiplication, subtraction, division, square, abs, log1p, sign, square root, and exponential together with Lang-IND features as the variables. Prior to second stage of the evolutionary approach, a sensitivity analysis is applied to shorten the best equation while maintaining the F1-score performance. In the second stage, an Adaptive Differential Evolution (ADE), is used to fine-tune the mathematical model. The experimental results conclude that the proposed two-stage evolutionary approach can be applied in fake news detection and the model can learn to predict using the Lang-IND features. Results from the first stage shows that the equation from GP scores F1-score of 83.23% on Fake.my-COVID19 dataset using complex arithmetic operators and at tree depth of 8. After the fine-tuning stage, the model performance increases the F1-score to 84.44%. The performance of the proposed two-stage evolutionary approach outperforms the baseline performance of six commonly-used machine learning algorithms, with Random Forest having the highest F1-score of 84.07%. The mathematical model is also tested separately on two other unseen datasets of different domain topic or language and achieves acceptable F1-scores.

Keywords