Revista Española de Educación Médica (Nov 2024)
Comparison of Automatic Item Generation Methods in the Assessment of Clinical Reasoning Skills
Abstract
The use of automatic item generation (AIG) methods offers potential for assessing clinical reasoning (CR) skills in medical education, a critical skill combining intuitive and analytical thinking. In preclinical education, these skills are commonly evaluated through written exams and case-based multiple-choice questions (MCQs), which are widely used due to the high number of students, ease of standardization, and quick evaluation. This research generated CR-focused questions for medical exams using two primary AIG methods: template-based and non-template-based (using AI tools like ChatGPT for a flexible approach). A total of 18 questions were produced on ordering radiologic investigations for abdominal emergencies, alongside faculty-developed questions used in medical exams for comparison. Experienced radiologists evaluated the questions based on clarity, clinical relevance, and effectiveness in measuring CR skills. Results showed that ChatGPT-generated questions measured CR skills with an 84.52% success rate, faculty-developed questions with 82.14%, and template-based questions with 78.57%, indicating that both AIG methods are effective in CR assessment, with ChatGPT performing slightly better. Both AIG methods received high ratings for clarity and clinical suitability, showing promise in producing effective CR-assessing questions comparable to, and in some cases surpassing, faculty-developed questions. While template-based AIG is effective, it requires more time and effort, suggesting that both methods may offer time-saving potential in exam preparation for educators.
Keywords