Exploring the Landscape of Intrinsic Plagiarism Detection: Benchmarks, Techniques, Evolution, and Challenges

Muhammad Faraz Manzoor; Muhammad Shoaib Farooq; Muhammad Haseeb; Uzma Farooq; Sohail Khalid; Adnan Abid

doi:10.1109/ACCESS.2023.3338855

IEEE Access (Jan 2023)

Exploring the Landscape of Intrinsic Plagiarism Detection: Benchmarks, Techniques, Evolution, and Challenges

Muhammad Faraz Manzoor,
Muhammad Shoaib Farooq,
Muhammad Haseeb,
Uzma Farooq,
Sohail Khalid,
Adnan Abid

Affiliations

Muhammad Faraz Manzoor: ORCiD; Department of Computer Science, University of Management and Technology, Lahore, Pakistan
Muhammad Shoaib Farooq: ORCiD; Department of Computer Science, University of Management and Technology, Lahore, Pakistan
Muhammad Haseeb: ORCiD; Department of Computer Science, University of Management and Technology, Lahore, Pakistan
Uzma Farooq: ORCiD; Department of Computer Science, University of Management and Technology, Lahore, Pakistan
Sohail Khalid: Petroleum Engineering Application Service Department, Saudi Aramco, Dhahran, Saudi Arabia
Adnan Abid: Department of Data Science, Faculty of Computing and Information Technology, University of the Punjab, Lahore, Pakistan

DOI: https://doi.org/10.1109/ACCESS.2023.3338855
Journal volume & issue: Vol. 11
pp. 140519 – 140545

Abstract

Read online

In the realm of text analysis, intrinsic plagiarism detection plays a crucial role by aiming to identify instances of plagiarized content within a document and determining whether parts of the text originate from the same author. As the development of Large Language Models (LLMs) based content generation tools such as, ChatGPT is publicly available, the challenge of intrinsic plagiarism has become increasingly significant in various domains. Consequently, there is a growing demand for robust and accurate detection methods to address this evolving landscape. This study conducts a comprehensive Systematic Literature Review (SLR), analyzing 44 research papers that explore various facets of intrinsic plagiarism detection, including common datasets, feature extraction techniques, and detection methods. This SLR also highlights the evolution of detection approaches over time and the challenges faced in this context especially challenges associated with low-resource languages. To the best of our knowledge, there is no SLR exclusively based on the intrinsic plagiarism detection that bridge the gap in existing literature and offering valuable insights to researchers and practitioners. By consolidating the state-of-the-art findings, this SLR serves as a foundation for future research, enabling the development of more effective and efficient plagiarism detection solutions to combat the ever-evolving challenges posed by plagiarism in today’s digital age.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords