Use prompt to differentiate text generated by ChatGPT and humans

Ruopeng An; Yuyi Yang; Fan Yang; Shanshan Wang

Machine Learning with Applications (Dec 2023)

Use prompt to differentiate text generated by ChatGPT and humans

Ruopeng An,
Yuyi Yang,
Fan Yang,
Shanshan Wang

Affiliations

Ruopeng An: Brown School at Washington University in St. Louis, Saint Louis, US
Yuyi Yang: Brown School at Washington University in St. Louis, Saint Louis, US; Corresponding author at: Brown School at Washington University, St. Louis.
Fan Yang: Dongbei University of Finance and Economics, Dalian, China
Shanshan Wang: University of North Texas Health Science Center, Fort Worth, US

Journal volume & issue: Vol. 14
p. 100497

Abstract

Read online

As the Chat Generative Pre-trained Transformer (ChatGPT) achieves increased proficiency in diverse language tasks, its potential implications for academic integrity and plagiarism risks have become concerning. Traditional plagiarism detection tools primarily analyze text passages, which may fall short when identifying machine-generated text. This study aims to introduce a method that uses both prompts and essays to differentiate between machine-generated and human-written text, with the goal of enhancing classification accuracy and addressing concerns of academic integrity. Leveraging a dataset of student-written essays responding to eight distinct prompts, we generated comparable essays with ChatGPT. Similarity scores within machine-generated essays (“within” scores) and between human-written and machine-generated essays (“between” scores) were computed. Subsequently, we used the percentile scores of the “between” scores within the “within” scores distribution to gauge the probability of an essay being machine-generated. Our proposed method achieved high classification accuracy, with an AUC score of 0.991, a false positive rate of 0.01, and a false negative rate of 0.037 in the test set. This validates its effectiveness in distinguishing between machine-generated and human-written essays and shows that it outperforms existing approaches based solely on text passages. This research presents a straightforward and effective method to detect machine-generated essays using prompts, providing a reliable solution to maintain academic integrity in the era of advanced language models like ChatGPT. Nevertheless, the method is not without its limitations, warranting further research to investigate its performance across diverse educational contexts, various prompts, and different model hyperparameters.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords