Data in Brief (Apr 2024)

Dataset for detecting and characterizing Arab computation propaganda on X

  • Bodor Moheel Almotairy,
  • Manal Abdullah,
  • Dimah Hussein Alahmadi

Journal volume & issue
Vol. 53
p. 110089

Abstract

Read online

Arab nations are greatly influenced by computational propaganda. Detecting Arab computational propaganda has become a trending topic in social media research. Despite all the efforts made, the definitive definition of a propagandistic characteristic is still not clear. Additionally, the earlier datasets were acquired and labelled for a specific study but were neglected thereafter. As a result, researchers are unable to assess whether the proposed AI detectors can be generalized or not. There is a lack of real ground truth, either to characterize Arab propagandist behaviours or evaluate the new proposed detectors. The provided dataset aims to demonstrate the value of characterizing Arab computational propaganda on X (Twitter) to close the research gap. It is prepared using a scientific approach to guarantee data quality. To ensure the quality of the data, the propagandist users’ data was requested from the X Transparency center. Although the data released by X relates to propagandist users, at their level, the tweets were not classified as propaganda or not. Usually, propagandists mix propaganda and non-propaganda tweets to hide their identities. Therefore, three journalist volunteers were employed to label 2100 tweets for either propaganda or not and then label the propagandist tweet according to the propaganda technique used. The dataset covers sports and banking issues. As a result, the dataset consists of 16,355,558 tweets with their meta data from propagandist users in 2019. Plus, 2100 propagandists labelled tweets. The propagandist's dataset helps the research community apply supervised and unsupervised machine learning and deep learning algorithms to classify the credibility of Arab tweets and users. On the other hand, this paper suggests looking at behaviour rather than content to distinguish propaganda communication. The datasets enable deep non-textual analysis to investigate the main characteristics of Arab computational propaganda on X.

Keywords