A Multiple change-point detection framework on linguistic characteristics of real versus fake news articles

Nikolas Petrou; Chrysovalantis Christodoulou; Andreas Anastasiou; George Pallis; Marios D. Dikaiakos

doi:10.1038/s41598-023-32952-3

Scientific Reports (Apr 2023)

A Multiple change-point detection framework on linguistic characteristics of real versus fake news articles

Nikolas Petrou,
Chrysovalantis Christodoulou,
Andreas Anastasiou,
George Pallis,
Marios D. Dikaiakos

Affiliations

Nikolas Petrou: Computer Science Department, University of Cyprus
Chrysovalantis Christodoulou: Computer Science Department, University of Cyprus
Andreas Anastasiou: Department of Mathematics and Statistics, University of Cyprus
George Pallis: Computer Science Department, University of Cyprus
Marios D. Dikaiakos: Computer Science Department, University of Cyprus

DOI: https://doi.org/10.1038/s41598-023-32952-3
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Extracting information from textual data of news articles has been proven to be significant in developing efficient fake news detection systems. Pointedly, to fight disinformation, researchers concentrated on extracting information which focuses on exploiting linguistic characteristics that are common in fake news and can aid in detecting false content automatically. Even though these approaches were proven to have high performance, the research community proved that both the language as well as the word use in literature are evolving. Therefore, the objective of this paper is to explore the linguistic characteristics of fake news and real ones over time. To achieve this, we establish a large dataset containing linguistic characteristics of various articles over the years. In addition, we introduce a novel framework where the articles are classified in specified topics based on their content and the most informative linguistic features are extracted using dimensionality reduction methods. Eventually, the framework detects the changes of the extracted linguistic features on real and fake news articles over the time incorporating a novel change-point detection method. By employing our framework for the established dataset, we noticed that the linguistic characteristics which concern the article’s title seem to be significantly important in capturing important movements in the similarity level of “Fake” and “Real” articles.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal