ChatGPT versus human essayists: an exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities

T. Revell; W. Yeadon; G. Cahilly-Bretzin; I. Clarke; G. Manning; J. Jones; C. Mulley; R. J. Pascual; N. Bradley; D. Thomas; F. Leneghan

doi:10.1007/s40979-024-00161-8

International Journal for Educational Integrity (Oct 2024)

ChatGPT versus human essayists: an exploration of the impact of artificial intelligence for authorship and academic integrity in the humanities

T. Revell,
W. Yeadon,
G. Cahilly-Bretzin,
I. Clarke,
G. Manning,
J. Jones,
C. Mulley,
R. J. Pascual,
N. Bradley,
D. Thomas,
F. Leneghan

Affiliations

T. Revell: Faculty of English Language and Literature, University of Oxford
W. Yeadon: Department of Physics, University of Durham
G. Cahilly-Bretzin: Department of Archaeology, Classics and Egyptology, University of Liverpool
I. Clarke: Faculty of English Language and Literature, University of Oxford
G. Manning: Faculty of English Language and Literature, University of Oxford
J. Jones: Faculty of English Language and Literature, University of Oxford
C. Mulley: Faculty of English Language and Literature, University of Oxford
R. J. Pascual: Faculty of English Language and Literature, University of Oxford
N. Bradley: Faculty of English Language and Literature, University of Oxford
D. Thomas: Faculty of English Language and Literature, University of Oxford
F. Leneghan: Faculty of English Language and Literature, University of Oxford

DOI: https://doi.org/10.1007/s40979-024-00161-8
Journal volume & issue: Vol. 20, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Generative AI has prompted educators to reevaluate traditional teaching and assessment methods. This study examines AI’s ability to write essays analysing Old English poetry; human markers assessed and attempted to distinguish them from authentic analyses of poetry by first-year undergraduate students in English at the University of Oxford. Using the standard UK University grading system, AI-written essays averaged a score of 60.46, whilst human essays achieved 63.57, a margin of difference not statistically significant (p = 0.10). Notably, student submissions applied a nuanced understanding of cultural context and secondary criticism to their close reading, while AI essays often described rather than analysed, lacking depth in the evaluation of poetic features, and sometimes failing to properly recognise key aspects of passages. Distinguishing features of human essays included detailed and sustained analysis of poetic style, as well as spelling errors and lack of structural cohesion. AI essays, on the other hand, exhibited a more formal structure and tone but sometimes fell short in incisive critique of poetic form and effect. Human markers correctly identified the origin of essays 79.41% of the time. Additionally, we compare three purported AI detectors, finding that the best, ‘Quillbot’, correctly identified the origin of essays 95.59% of the time. However, given the high threshold for academic misconduct, conclusively determining origin remains challenging. The research also highlights the potential benefits of generative AI’s ability to advise on structuring essays and suggesting avenues for research. We advocate for transparency regarding AI’s capabilities and limitations, and this study underscores the importance of human critical engagement in teaching and learning in Higher Education. As AI’s proficiency grows, educators must reevaluate what authentic assessment is, and consider implementing dynamic, holistic methods to ensure academic integrity.

Published in International Journal for Educational Integrity

ISSN: 1833-2595 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Education: Theory and practice of education
Website: https://edintegrity.biomedcentral.com/

About the journal

Abstract

Keywords