Scientific Reports (Nov 2024)

Identifying fabricated networks within authorship-for-sale enterprises

  • Simon J. Porter,
  • Leslie D. McIntosh

DOI
https://doi.org/10.1038/s41598-024-71230-8
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 21

Abstract

Read online

Abstract It is estimated that 2% of all journal submissions across all disciplines originate from paper mills, both creating significant risk that the body of research that we rely on to progress becomes corrupted, and placing undue burden on the submission process to reject these articles. By understanding how the business of paper mills—the technological approaches that they adopt, as well as the social structures that they require to operate—the research community can be empowered to develop strategies that make it harder, or ideally impossible for them to operate. Most of the contemporary work in paper-mill detection has focused on identifying the signals that have been left behind inside the text or structure of fabricated papers that result from the technological approaches that paper mills employ. As technologies employed by paper mills advance, these signals will become harder to detect. Fabricated papers do not just need text, images, and data however, they also require a fabricated or partially fabricated network of authors. Most ‘authors’ on a fabricated paper have not been associated with the research, but rather are added through a transaction. This lack of deeper connection means that there is a low likelihood that co-authors on fabricated papers will ever appear together on the same paper more than once. This paper constructs a model that encodes some of the key characteristics of this activity in an ‘authorship-for-sale’ network with the aim to create a robust method to detect this type of activity. A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks. The model suggested in this paper detects networks that have a statistically significant overlap with other approaches that principally rely on textual analysis for the detection of fraudulent papers. Researchers connected to networks identified using the methodology outlined in this paper are shown to be connected with 37% of papers identified through the tortured-phrase and clay-feet methods deployed in the Problematic Paper Screener website. Finally, methods to limit the expansion and propagation of these networks is discussed both in technological and social terms.