Machine Learning with Applications (Dec 2023)
Use prompt to differentiate text generated by ChatGPT and humans
Abstract
As the Chat Generative Pre-trained Transformer (ChatGPT) achieves increased proficiency in diverse language tasks, its potential implications for academic integrity and plagiarism risks have become concerning. Traditional plagiarism detection tools primarily analyze text passages, which may fall short when identifying machine-generated text. This study aims to introduce a method that uses both prompts and essays to differentiate between machine-generated and human-written text, with the goal of enhancing classification accuracy and addressing concerns of academic integrity. Leveraging a dataset of student-written essays responding to eight distinct prompts, we generated comparable essays with ChatGPT. Similarity scores within machine-generated essays (“within” scores) and between human-written and machine-generated essays (“between” scores) were computed. Subsequently, we used the percentile scores of the “between” scores within the “within” scores distribution to gauge the probability of an essay being machine-generated. Our proposed method achieved high classification accuracy, with an AUC score of 0.991, a false positive rate of 0.01, and a false negative rate of 0.037 in the test set. This validates its effectiveness in distinguishing between machine-generated and human-written essays and shows that it outperforms existing approaches based solely on text passages. This research presents a straightforward and effective method to detect machine-generated essays using prompts, providing a reliable solution to maintain academic integrity in the era of advanced language models like ChatGPT. Nevertheless, the method is not without its limitations, warranting further research to investigate its performance across diverse educational contexts, various prompts, and different model hyperparameters.