Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection

Eric C Leas; John W Ayers; Nimit Desai; Mark Dredze; Michael Hogarth; Davey M Smith

doi:10.2196/52499

Journal of Medical Internet Research (May 2024)

Using Large Language Models to Support Content Analysis: A Case Study of ChatGPT for Adverse Event Detection

Eric C Leas,
John W Ayers,
Nimit Desai,
Mark Dredze,
Michael Hogarth,
Davey M Smith

Affiliations

Eric C Leas: ORCiD
John W Ayers: ORCiD
Nimit Desai: ORCiD
Mark Dredze: ORCiD
Michael Hogarth: ORCiD
Davey M Smith: ORCiD

DOI: https://doi.org/10.2196/52499
Journal volume & issue: Vol. 26
p. e52499

Abstract

Read online

This study explores the potential of using large language models to assist content analysis by conducting a case study to identify adverse events (AEs) in social media posts. The case study compares ChatGPT’s performance with human annotators’ in detecting AEs associated with delta-8-tetrahydrocannabinol, a cannabis-derived product. Using the identical instructions given to human annotators, ChatGPT closely approximated human results, with a high degree of agreement noted: 94.4% (9436/10,000) for any AE detection (Fleiss κ=0.95) and 99.3% (9931/10,000) for serious AEs (κ=0.96). These findings suggest that ChatGPT has the potential to replicate human annotation accurately and efficiently. The study recognizes possible limitations, including concerns about the generalizability due to ChatGPT’s training data, and prompts further research with different models, data sources, and content analysis tasks. The study highlights the promise of large language models for enhancing the efficiency of biomedical research.

Published in Journal of Medical Internet Research

ISSN: 1438-8871 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Public aspects of medicine
Website: https://www.jmir.org

About the journal