IEEE Access (Jan 2022)

Detecting Documents With Inconsistent Context

  • Dongin Jung,
  • Misuk Kim,
  • Yoon-Sik Cho

DOI
https://doi.org/10.1109/ACCESS.2022.3204151
Journal volume & issue
Vol. 10
pp. 98970 – 98980

Abstract

Read online

Extremely large volumes of documents are available from online news platforms and social media. While the quantity of these documents have grown exponentially, the majority lack their quality, which can cause digital fatigue or promote misinformation. To this end, we propose a novel framework that can evaluate the quality of documents in terms of consistency. We model low-quality document detection as a binary classification task, which is able to measure how the documents have consistent contents. Specifically, we relax the problem by considering each sentence or paragraph as node. A given document is then considered as a network of nodes. We show how we define the supernode in a network and show how it is informative enough to detect whether the document is consistent or not. We believe this scheme can be applied to various applications including fake news detection, and document screening with qualitative evaluations. We achieve the state-of-the-art on existing tasks using the NELA17 dataset, and YH-News dataset which we release in this paper.

Keywords