OTO Open (Jul 2024)
Evaluating ChatGPT as a Patient Education Tool for COVID‐19‐Induced Olfactory Dysfunction
Abstract
Abstract Objective While most patients with COVID‐19‐induced olfactory dysfunction (OD) recover spontaneously, those with persistent OD face significant physical and psychological sequelae. ChatGPT, an artificial intelligence chatbot, has grown as a tool for patient education. This study seeks to evaluate the quality of ChatGPT‐generated responses for COVID‐19 OD. Study Design Quantitative observational study. Setting Publicly available online website. Methods ChatGPT (GPT‐4) was queried 4 times with 30 identical questions. Prior to questioning, Chat‐GPT was “prompted” to respond (1) to a patient, (2) to an eighth grader, (3) with references, and (4) no prompt. Answer accuracy was independently scored by 4 rhinologists using the Global Quality Score (GCS, range: 1‐5). Proportions of responses at incremental score thresholds were compared using χ2 analysis. Flesch‐Kincaid grade level was calculated for each answer. Relationship between prompt type and grade level was assessed via analysis of variance. Results Across all graded responses (n = 480), 364 responses (75.8%) were “at least good” (GCS ≥ 4). Proportions of responses that were “at least good” (P < .0001) or “excellent” (GCS = 5) (P < .0001) differed by prompt; “at least moderate” (GCS ≥ 3) responses did not (P = .687). Eighth‐grade level (14.06 ± 2.3) and patient‐friendly (14.33 ± 2.0) responses were significantly lower mean grade level than no prompting (P < .0001). Conclusion ChatGPT provides appropriate answers to most questions on COVID‐19 OD regardless of prompting. However, prompting influences response quality and grade level. ChatGPT responds at grade levels above accepted recommendations for presenting medical information to patients. Currently, ChatGPT offers significant potential for patient education as an adjunct to the conventional patient‐physician relationship.
Keywords