Interoperability and FAIRness through a novel combination of Web technologies

Mark D. Wilkinson; Ruben Verborgh; Luiz Olavo Bonino da Silva Santos; Tim Clark; Morris A. Swertz; Fleur D.L. Kelpin; Alasdair J.G. Gray; Erik A. Schultes; Erik M. van Mulligen; Paolo Ciccarese; Arnold Kuzniar; Anand Gavai; Mark Thompson; Rajaram Kaliyaperumal; Jerven T. Bolleman; Michel Dumontier

doi:10.7717/peerj-cs.110

PeerJ Computer Science (Apr 2017)

Interoperability and FAIRness through a novel combination of Web technologies

Mark D. Wilkinson,
Ruben Verborgh,
Luiz Olavo Bonino da Silva Santos,
Tim Clark,
Morris A. Swertz,
Fleur D.L. Kelpin,
Alasdair J.G. Gray,
Erik A. Schultes,
Erik M. van Mulligen,
Paolo Ciccarese,
Arnold Kuzniar,
Anand Gavai,
Mark Thompson,
Rajaram Kaliyaperumal,
Jerven T. Bolleman,
Michel Dumontier

Affiliations

Mark D. Wilkinson: Center for Plant Biotechnology and Genomics UPM-INIA, Universidad Politécnica de Madrid, Madrid, Spain
Ruben Verborgh: IMEC, Ghent University, Ghent, Belgium
Luiz Olavo Bonino da Silva Santos: Dutch Techcentre for Life Sciences, Utrecht, The Netherlands
Tim Clark: Department of Neurology, Massachusetts General Hospital, Boston, MA, United States of America
Morris A. Swertz: Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands
Fleur D.L. Kelpin: Genomics Coordination Center and Department of Genetics, University Medical Center Groningen, Groningen, The Netherlands
Alasdair J.G. Gray: Department of Computer Science, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, United Kingdom
Erik A. Schultes: FAIR Data, Dutch TechCenter for Life Science, Utrecht, The Netherlands
Erik M. van Mulligen: Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, The Netherlands
Paolo Ciccarese: Elmer Innovation Lab, Harvard Medical School, Boston, United States of America
Arnold Kuzniar: Netherlands eScience Center, Amsterdam, The Netherlands
Anand Gavai: Netherlands eScience Center, Amsterdam, The Netherlands
Mark Thompson: Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Rajaram Kaliyaperumal: Department of Human Genetics, Leiden University Medical Center, Leiden, The Netherlands
Jerven T. Bolleman: Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
Michel Dumontier: Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, CA, United States of America

DOI: https://doi.org/10.7717/peerj-cs.110
Journal volume & issue: Vol. 3
p. e110

Abstract

Read online Read online

Data in the life sciences are extremely diverse and are stored in a broad spectrum of repositories ranging from those designed for particular data types (such as KEGG for pathway data or UniProt for protein data) to those that are general-purpose (such as FigShare, Zenodo, Dataverse or EUDAT). These data have widely different levels of sensitivity and security considerations. For example, clinical observations about genetic mutations in patients are highly sensitive, while observations of species diversity are generally not. The lack of uniformity in data models from one repository to another, and in the richness and availability of metadata descriptions, makes integration and analysis of these data a manual, time-consuming task with no scalability. Here we explore a set of resource-oriented Web design patterns for data discovery, accessibility, transformation, and integration that can be implemented by any general- or special-purpose repository as a means to assist users in finding and reusing their data holdings. We show that by using off-the-shelf technologies, interoperability can be achieved atthe level of an individual spreadsheet cell. We note that the behaviours of this architecture compare favourably to the desiderata defined by the FAIR Data Principles, and can therefore represent an exemplar implementation of those principles. The proposed interoperability design patterns may be used to improve discovery and integration of both new and legacy data, maximizing the utility of all scholarly outputs.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords