BMC Medical Research Methodology (May 2025)
Behavior of test specificity under an imperfect gold standard: findings from a simulation study and analysis of real-world oncology data
Abstract
Abstract Background Gold standards used in validation of new tests may be imperfect, with sensitivity or specificity less than 100%. The impact of imperfection in a gold standard on measured test attributes has been demonstrated formally, but its relevance in real-world oncology research may not be well understood. Methods This simulation study examined the impact of imperfect gold standard sensitivity on measured test specificity at different levels of condition prevalence for a hypothetical real-world measure of death. The study also evaluated real-world oncology datasets with a linked National Death Index (NDI) dataset, to examine the measured specificity of a death indicator at levels of death prevalence that matched the simulation. The simulation and real-world data analysis both examined measured specificity of the death indicator at death prevalence ranging from 50 to 98%. To isolate the effects of death prevalence and imperfect gold standard sensitivity, the simulation assumed a test with perfect sensitivity and specificity, and with perfect gold standard specificity. However, gold standard sensitivity was modeled at values from 90 to 99%. Results Results of the simulation showed that decreasing gold standard sensitivity was associated with increasing underestimation of test specificity, and that the extent of underestimation increased with higher death prevalence. Analysis of the real-world data yielded findings that closely matched the simulation pattern. At 98% death prevalence, near-perfect gold standard sensitivity (99%) still resulted in suppression of specificity from the true value of 100% to the measured value of < 67%. Conclusions New validation research, and review of existing validation studies, should consider the prevalence of the conditions assessed by a measure, and the possible impact on sensitivity and specificity of an imperfect gold standard.
Keywords