IEEE Access (Jan 2024)

A Proposed Model for Distinguishing Between Human-Based and ChatGPT Content in Scientific Articles

  • Toka A. Mohamed,
  • Mohamed H. Khafgy,
  • Ahmed B. Elsedawy,
  • Ahmed S. Ismail

DOI
https://doi.org/10.1109/ACCESS.2024.3448315
Journal volume & issue
Vol. 12
pp. 121251 – 121260

Abstract

Read online

This study introduces an innovative approach to address the growing challenge of detecting and distinguishing ChatGPT-generated content within scientific articles, particularly in the context of Learning Management Systems (LMS). Leveraging state-of-the-art large language models, including Robustly Optimized BERT Pretraining (RoBERTa), Text-to-Text Transfer Transformer (T5), and Generative Pre-trained Transformers (EleutherAI GPT-Neo-125M), our methodology focuses on the incorporation of the LMS concept into the research framework. To construct a comprehensive dataset representative of the diverse landscape of scientific abstracts, samples of the dataset are gathered from articles produced by human authors and those generated by ChatGPT within the LMS framework. The models (RoBERTa, T5, and EleutherAI GPT-Neo-125M) were subsequently trained on this unique dataset, showcasing their adaptability to the distinct characteristics of both human-generated and AI-generated content within the LMS context. The efficacy of our approach was rigorously evaluated using a range of metrics, resulting in an outstanding accuracy exceeding 99%. This achievement underscores the robustness of our methodology in successfully discerning content generated by ChatGPT within the LMS and that authored by human contributors, thereby advancing the field of content differentiation in scientific discourse.

Keywords