São Paulo Medical Journal (Apr 2020)

Relationships between Bloom’s taxonomy, judges’ estimation of item difficulty and psychometric properties of items from a progress test: a prospective observational study

  • Pedro Tadao Hamamoto Filho,
  • Eduardo Silva,
  • Zilda Maria Tosta Ribeiro,
  • Maria de Lourdes Marmorato Botta Hafner,
  • Dario Cecilio-Fernandes,
  • Angélica Maria Bicudo

DOI
https://doi.org/10.1590/1516-3180.2019.0459.r1.19112019
Journal volume & issue
Vol. 138, no. 1
pp. 33 – 39

Abstract

Read online

ABSTRACT BACKGROUND: Progress tests are longitudinal assessments of students’ knowledge based on successive tests. Calibration of the test difficulty is challenging, especially because of the tendency of item-writers to overestimate students’ performance. The relationships between the levels of Bloom’s taxonomy, the ability of test judges to predict the difficulty of test items and the real psychometric properties of test items have been insufficiently studied. OBJECTIVE: To investigate the psychometric properties of items according to their classification in Bloom’s taxonomy and judges’ estimates, through an adaptation of the Angoff method. DESIGN AND SETTING: Prospective observational study using secondary data from students’ performance in a progress test applied to ten medical schools, mainly in the state of São Paulo, Brazil. METHODS: We compared the expected and real difficulty of items used in a progress test. The items were classified according to Bloom’s taxonomy. Psychometric properties were assessed based on their taxonomy and fields of knowledge. RESULTS: There was a 54% match between the panel of experts’ expectations and the real difficulty of items. Items that were expected to be easy had mean difficulty that was significantly lower than that of items that were expected to be medium (P < 0.05) or difficult (P < 0.01). Items with high-level taxonomy had higher discrimination indices than low-level items (P = 0.026). We did not find any significant differences between the fields in terms of difficulty and discrimination. CONCLUSIONS: Our study demonstrated that items with high-level taxonomy performed better in discrimination indices and that a panel of experts may develop coherent reasoning regarding the difficulty of items.

Keywords