X-search: an open access interface for cross-cohort exploration of the National Sleep Research Resource

Licong Cui; Ningzhou Zeng; Matthew Kim; Remo Mueller; Emily R. Hankosky; Susan Redline; Guo-Qiang Zhang

doi:10.1186/s12911-018-0682-y

BMC Medical Informatics and Decision Making (Nov 2018)

X-search: an open access interface for cross-cohort exploration of the National Sleep Research Resource

Licong Cui,
Ningzhou Zeng,
Matthew Kim,
Remo Mueller,
Emily R. Hankosky,
Susan Redline,
Guo-Qiang Zhang

Affiliations

Licong Cui: Department of Computer Science, University of Kentucky
Ningzhou Zeng: Department of Computer Science, University of Kentucky
Matthew Kim: Brigham and Women’s Hospital
Remo Mueller: Brigham and Women’s Hospital
Emily R. Hankosky: Institute for Biomedical Informatics, University of Kentucky
Susan Redline: Brigham and Women’s Hospital
Guo-Qiang Zhang: Department of Computer Science, University of Kentucky

DOI: https://doi.org/10.1186/s12911-018-0682-y
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies. Methods X-search has been designed as a general framework with two loosely-coupled components: semantically annotated data repository and cross-cohort exploration engine. The semantically annotated data repository is comprised of a canonical data dictionary, data sources with a data dictionary, and mappings between each individual data dictionary and the canonical data dictionary. The cross-cohort exploration engine consists of five modules: query builder, graphical exploration, case-control exploration, query translation, and query execution. The canonical data dictionary serves as the unified metadata to drive the visual exploration interfaces and facilitate query translation through the mappings. Results X-search is publicly available at https://www.x-search.net/with nine NSRR datasets consisting of over 26,000 unique subjects. The canonical data dictionary contains over 900 common data elements across the datasets. X-search has received over 1800 cross-cohort queries by users from 16 countries. Conclusions X-search provides a powerful cross-cohort exploration interface for querying and exploring heterogeneous datasets in the NSRR data repository, so as to enable researchers to evaluate the feasibility of potential research studies and generate potential hypotheses using the NSRR data.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords