PeerJ (Aug 2016)
The health care and life sciences community profile for dataset descriptions
- Michel Dumontier,
- Alasdair J.G. Gray,
- M. Scott Marshall,
- Vladimir Alexiev,
- Peter Ansell,
- Gary Bader,
- Joachim Baran,
- Jerven T. Bolleman,
- Alison Callahan,
- José Cruz-Toledo,
- Pascale Gaudet,
- Erich A. Gombocz,
- Alejandra N. Gonzalez-Beltran,
- Paul Groth,
- Melissa Haendel,
- Maori Ito,
- Simon Jupp,
- Nick Juty,
- Toshiaki Katayama,
- Norio Kobayashi,
- Kalpana Krishnaswami,
- Camille Laibe,
- Nicolas Le Novère,
- Simon Lin,
- James Malone,
- Michael Miller,
- Christopher J. Mungall,
- Laurens Rietveld,
- Sarala M. Wimalaratne,
- Atsuko Yamaguchi
Affiliations
- Michel Dumontier
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
- Alasdair J.G. Gray
- Department of Computer Science, Heriot-Watt University, Edinburgh, United Kingdom
- M. Scott Marshall
- Department of Radiation Oncology (MAASTRO), GROW— School for Oncology and Developmental Biology, MAASTRO Clinic, Maastricht, Netherlands
- Vladimir Alexiev
- Ontotext Corporation, Sofia, Bulgaria
- Peter Ansell
- CSIRO, Australia
- Gary Bader
- The Donnelly Centre, University of Toronto, Toronto, Canada
- Joachim Baran
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
- Jerven T. Bolleman
- Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
- Alison Callahan
- Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, United States of America
- José Cruz-Toledo
- Carleton University, Canada
- Pascale Gaudet
- CALIPHO group, SIB Swiss Institute of Bioinformatics, Geneve, Switzerland
- Erich A. Gombocz
- IO Informatics, Berkeley, CA, United States of America
- Alejandra N. Gonzalez-Beltran
- Oxford e-Research Centre, University of Oxford, Oxford, Oxfordshire, United Kingdom
- Paul Groth
- Elsevier Labs, Netherlands
- Melissa Haendel
- Department of Medical Informatics and Epidemiology, Oregon Health Sciences University, Portland, OR, United States of America
- Maori Ito
- Office of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Chiyoda-ku, Japan
- Simon Jupp
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
- Nick Juty
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
- Toshiaki Katayama
- Database Center for Life Science, Kashiwa, Japan
- Norio Kobayashi
- Advanced Center for Computing and Communication, RIKEN, Wako-shi, Saitama, Japan
- Kalpana Krishnaswami
- Cerenode Inc., United States of America
- Camille Laibe
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
- Nicolas Le Novère
- The Babraham Institute, Cambridge, United Kingdom
- Simon Lin
- Nationwide Children’s Hospital, Columbus, OH, United States of America
- James Malone
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
- Michael Miller
- Institute for Systems Biology, Seattle, WA, United States of America
- Christopher J. Mungall
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, United States of America
- Laurens Rietveld
- Department of Exact Sciences, VU University Amsterdam, Amsterdam, Netherlands
- Sarala M. Wimalaratne
- EMBL, European Bioinformatics Institute, Saffron Walden, United Kingdom
- Atsuko Yamaguchi
- Database Center for Life Science, Kashiwa, Japan
- DOI
- https://doi.org/10.7717/peerj.2331
- Journal volume & issue
-
Vol. 4
p. e2331
Abstract
Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.
Keywords