International Journal of Population Data Science (Sep 2018)

Inferring sensitivity and specificity of phenotyping algorithms using positive and negative predictive value in validation study in observational health data

  • Mingkai Peng,
  • Rosa Gini,
  • Tyler Williamson

DOI
https://doi.org/10.23889/ijpds.v3i4.951
Journal volume & issue
Vol. 3, no. 4

Abstract

Read online

Introduction In observational health data, phenotyping algorithms are needed to process raw information into clinically relevant features. Validation studies traditionally estimate sensitivity and specificity by comparing the phenotyping algorithm with a reference standard on a population sample. There are challenges to conduct validation studies for conditions with low prevalence. Objectives and Approach We propose a novel and efficient method for conducting validation studies to indirectly estimate the sensitivity and specificity. We simulated datasets with different levels of disease prevalence and phenotyping algorithms with different sensitivities and specificity. We applied both the traditional (direct) and new (indirect) method on simulated data to estimate the sensitivity and specificity and compare the performance of the two methods. We also designed a gate to exclude true negatives to improve study efficiency on conditions with low prevalence and sensitive analysis was conducted on the imperfect gate. Results The new (indirect) method provided better or comparable accuracy in estimating both sensitivity and specificity compared to the traditional (direct) method. Applying a gate enabled us to conduct validation study in conditions with very low prevalence. An imperfect gate results in the overestimation of sensitivity but has minimal effect on specificity. Conclusion/Implications The new (indirect) method provides an alternative way to conduct validation studies in observational health data with improvement in estimating accuracy.