Cell Genomics (Nov 2021)

Empirical validation of an automated approach to data use oversight

  • Moran N. Cabili,
  • Jonathan Lawson,
  • Andrea Saltzman,
  • Greg Rushton,
  • Pearl O’Rourke,
  • John Wilbanks,
  • Laura Lyman Rodriguez,
  • Tommi Nyronen,
  • Mélanie Courtot,
  • Stacey Donnelly,
  • Anthony A. Philippakis

Journal volume & issue
Vol. 1, no. 2
p. 100031

Abstract

Read online

Summary: The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute’s data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS’ automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance.

Keywords