Ecosphere (Nov 2019)

Data system design alters meaning in ecological data: salmon habitat restoration across the U.S. Pacific Northwest

  • Stephen L. Katz,
  • Katie A. Barnas,
  • Monica Diaz,
  • Stephanie E. Hampton

DOI
https://doi.org/10.1002/ecs2.2920
Journal volume & issue
Vol. 10, no. 11
pp. n/a – n/a

Abstract

Read online

Abstract As an increasing variety and complexity of environmental issues confront scientists and natural resource managers, assembling the most relevant and informative data into accessible data systems becomes critical to timely problem solving. Data interoperability is the key criterion for succeeding in that assembly, and much informatics research is focused on data federation, or synthesis to produce interoperable data. However, when candidate data come from numerous, diverse, and high‐value legacy data sources, the issue of data variety or heterogeneity can be a significant impediment to interoperability. Research in informatics, computer science and philosophy has frequently focused on resolving data heterogeneity with automation, but subject matter expertise still plays a large role. In particular, human expertise is a large component in the development of tools such as data dictionaries, crosswalks, and ontologies. Such representations may not always match from one data system to another, presenting potentially inconsistent results even with the same data. Here, we use a long‐term data set on management actions designed to improve stream habitat for endangered salmon in the Pacific Northwest, to illustrate how different representations can change the underlying information content in the data system. We pass the same data set comprised of 49,619 records through three ontologies, each developed to address a rational management need, and show that the inferences drawn from the data can change with choice of data representation or ontology. One striking example shows that the use of one ontology would suggest water quality improvement projects are the rarest and most expensive restoration actions undertaken, while another will suggest these actions to be the most common and least expensive type of management actions. The discrepancy relates to the origins of the data dictionaries themselves, with one designed to catalog management actions and the other focused on ecological processes. Thus, we argue that in data federation efforts humans are “in the loop” rationally, in the form of the ontologies they have chosen, and diminishing the human component in favor of automation carries risks. Consequently, data federation exercises should be accompanied by validations in order to evaluate and manage those risks.

Keywords