BMC Public Health (Dec 2019)
The impact of data quality and source data verification on epidemiologic inference: a practical application using HIV observational data
Abstract
Abstract Background Data audits are often evaluated soon after completion, even though the identification of systematic issues may lead to additional data quality improvements in the future. In this study, we assess the impact of the entire data audit process on subsequent statistical analyses. Methods We conducted on-site audits of datasets from nine international HIV care sites. Error rates were quantified for key demographic and clinical variables among a subset of records randomly selected for auditing. Based on audit results, some sites were tasked with targeted validation of high-error-rate variables resulting in a post-audit dataset. We estimated the times from antiretroviral therapy initiation until death and first AIDS-defining event using the pre-audit data, the audit data, and the post-audit data. Results The overall discrepancy rate between pre-audit and audit data (n = 250) across all audited variables was 17.1%. The estimated probability of mortality and an AIDS-defining event over time was higher in the audited data relative to the pre-audit data. Among patients represented in both the post-audit and pre-audit cohorts (n = 18,999), AIDS and mortality estimates also were higher in the post-audit data. Conclusion Though some changes may have occurred independently, our findings suggest that improved data quality following the audit may impact epidemiological inferences.
Keywords