PLoS ONE (Jan 2014)

A transparent and transferable framework for tracking quality information in large datasets.

  • Derek E Smith,
  • Stefan Metzger,
  • Jeffrey R Taylor

DOI
https://doi.org/10.1371/journal.pone.0112249
Journal volume & issue
Vol. 9, no. 11
p. e112249

Abstract

Read online

The ability to evaluate the validity of data is essential to any investigation, and manual "eyes on" assessments of data quality have dominated in the past. Yet, as the size of collected data continues to increase, so does the effort required to assess their quality. This challenge is of particular concern for networks that automate their data collection, and has resulted in the automation of many quality assurance and quality control analyses. Unfortunately, the interpretation of the resulting data quality flags can become quite challenging with large data sets. We have developed a framework to summarize data quality information and facilitate interpretation by the user. Our framework consists of first compiling data quality information and then presenting it through 2 separate mechanisms; a quality report and a quality summary. The quality report presents the results of specific quality analyses as they relate to individual observations, while the quality summary takes a spatial or temporal aggregate of each quality analysis and provides a summary of the results. Included in the quality summary is a final quality flag, which further condenses data quality information to assess whether a data product is valid or not. This framework has the added flexibility to allow "eyes on" information on data quality to be incorporated for many data types. Furthermore, this framework can aid problem tracking and resolution, should sensor or system malfunctions arise.