Data quality monitoring in clinical and observational epidemiologic studies: the role of metadata and process information

Richter, Adrian; Schössow, Janka; Werner, André; Schauer, Birgit; Radke, Dörte; Henke, Jörg; Struckmann, Stephan; Schmidt, Carsten Oliver

doi:10.3205/mibe000202

GMS Medizinische Informatik, Biometrie und Epidemiologie (Nov 2019)

Data quality monitoring in clinical and observational epidemiologic studies: the role of metadata and process information

Richter, Adrian,
Schössow, Janka,
Werner, André,
Schauer, Birgit,
Radke, Dörte,
Henke, Jörg,
Struckmann, Stephan,
Schmidt, Carsten Oliver

Affiliations

Richter, Adrian: Institute for Community Medicine, University Medicine Greifswald, Germany
Schössow, Janka: Institute for Community Medicine, University Medicine Greifswald, Germany
Werner, André: Institute for Community Medicine, University Medicine Greifswald, Germany
Schauer, Birgit: Institute for Community Medicine, University Medicine Greifswald, Germany
Radke, Dörte: Institute for Community Medicine, University Medicine Greifswald, Germany
Henke, Jörg: Institute for Community Medicine, University Medicine Greifswald, Germany
Struckmann, Stephan: Institute for Community Medicine, University Medicine Greifswald, Germany
Schmidt, Carsten Oliver: Institute for Community Medicine, University Medicine Greifswald, Germany

DOI: https://doi.org/10.3205/mibe000202
Journal volume & issue: Vol. 15, no. 1
p. Doc08

Abstract

Read online

High data quality is fundamental for valid inferences in health research. Metadata, i.e. “data that describe other data”, are essential to implement data quality assessments but more guidance on which metadata to use is needed. Similarly, the selection and use of variables describing the measurement process should be exemplified to improve the design and conduct of observational health studies. This work provides a conceptual framework and overview of metadata and process information for systematic data quality reports based on implementations within the population-based cohort Study of Health in Pomerania (SHIP). In previous years, a prerequisite for automated data quality checks has been established by the augmentation of the data dictionary; the added information of up to 20 different characteristics for each variable is used for data quality assessments and triggers diverse data quality checks. Conceptually we distinguish static metadata, variable metadata, and process variables. Examples for static metadata are the expected probability distribution, plausibility limits, and the data type. Variable metadata may be reference limits of a laboratory marker. Information inherent to these metadata allows for the detection of data quality flaws by comparing observed with expected data characteristics. In contrast, process variables, such as the observer or device ID, also allow for the identification of sources of data quality issues. This is the case even if characteristics defined in metadata were not violated. Metadata and process variables can be used alone or in combination to implement a versatile and efficient data quality assessment. A comprehensive setup of metadata and process variables is an extensive task, particularly in studies involving large data collections. Nonetheless, the gain in transparency and efficacy of data curation and quality reporting after this setup is considerable.

Published in GMS Medizinische Informatik, Biometrie und Epidemiologie

ISSN: 1860-9171 (Online)
Publisher: German Medical Science GMS Publishing House
Country of publisher: Germany
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Internal medicine: Infectious and parasitic diseases
Website: https://www.egms.de/dynamic/en/journals/mibe/index.htm

About the journal

Abstract

Keywords