Journal of Medical Education and Curricular Development (Feb 2024)
Validating Parallel-Forms Tests for Assessing Anesthesia Resident Knowledge
Abstract
We created a serious game to teach first year anesthesiology (CA-1) residents to perform general anesthesia for cesarean delivery. We aimed to investigate resident knowledge gains after playing the game and having received one of 2 modalities of debriefing. We report on the development and validation of scores from parallel test forms for criterion-referenced interpretations of resident knowledge. The test forms were intended for use as pre- and posttests for the experiment. Validation of instruments measuring the study's primary outcome was considered essential for adding rigor to the planned experiment, to be able to trust the study's results. Parallel, multiple-choice test forms development steps included: (1) assessment purpose and population specification; (2) content domain specification and writing/selection of items; (3) content validation by experts of paired items by topic and cognitive level; and (4) empirical validation of scores from the parallel test forms using Classical Test Theory (CTT) techniques. Field testing involved online administration of 52 shuffled items from both test forms to 24 CA-1's, 21 second-year anesthesiology (CA-2) residents, 2 fellows, 1 attending anesthesiologist, and 1 of unknown rank at 3 US institutions. Items from each form yielded near-normal score distributions, with similar medians, ranges, and standard deviations. Evaluations of CTT item difficulty (item p values) and discrimination (D) indices indicated that most items met assumptions of criterion-referenced test design, separating experienced from novice residents. Experienced residents performed better on overall domain scores than novices ( P < .05). Kuder-Richardson Formula 20 (KR-20) reliability estimates of both test forms were above the acceptability cut of .70, and parallel forms reliability estimate was high at .86, indicating results were consistent with theoretical expectations. Total scores of parallel test forms demonstrated item-level validity, strong internal consistency and parallel forms reliability, suggesting sufficient robustness for knowledge outcomes assessments of CA-1 residents.