Scientific Reports (Mar 2021)

From rumor to genetic mutation detection with explanations: a GAN approach

  • Mingxi Cheng,
  • Yizhi Li,
  • Shahin Nazarian,
  • Paul Bogdan

DOI
https://doi.org/10.1038/s41598-021-84993-1
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Social media have emerged as increasingly popular means and environments for information gathering and propagation. This vigorous growth of social media contributed not only to a pandemic (fast-spreading and far-reaching) of rumors and misinformation, but also to an urgent need for text-based rumor detection strategies. To speed up the detection of misinformation, traditional rumor detection methods based on hand-crafted feature selection need to be replaced by automatic artificial intelligence (AI) approaches. AI decision making systems require to provide explanations in order to assure users of their trustworthiness. Inspired by the thriving development of generative adversarial networks (GANs) on text applications, we propose a GAN-based layered model for rumor detection with explanations. To demonstrate the universality of the proposed approach, we demonstrate its benefits on a gene classification with mutation detection case study. Similarly to the rumor detection, the gene classification can also be formulated as a text-based classification problem. Unlike fake news detection that needs a previously collected verified news database, our model provides explanations in rumor detection based on tweet-level texts only without referring to a verified news database. The layered structure of both generative and discriminative models contributes to the outstanding performance. The layered generators produce rumors by intelligently inserting controversial information in non-rumors, and force the layered discriminators to detect detailed glitches and deduce exactly which parts in the sentence are problematic. On average, in the rumor detection task, our proposed model outperforms state-of-the-art baselines on PHEME dataset by $$26.85\%$$ 26.85 % in terms of macro-f1. The excellent performance of our model for textural sequences is also demonstrated by the gene mutation case study on which it achieves $$72.69\%$$ 72.69 % macro-f1 score.