Persistent Identifier Practice for Big Data Management at NCI

Jingbo Wang; Nicolas Car; Ben Evans; Kashif Gohar; Claire Trenham; Lesley Wyborn

doi:10.5334/dsj-2017-020

Data Science Journal (Apr 2017)

Persistent Identifier Practice for Big Data Management at NCI

Jingbo Wang,
Nicolas Car,
Ben Evans,
Kashif Gohar,
Claire Trenham,
Lesley Wyborn

Affiliations

Jingbo Wang: National Computational Infrastructure, Canberra
Nicolas Car: Geoscience Australia, Canberra
Ben Evans: National Computational Infrastructure, Canberra
Kashif Gohar: National Computational Infrastructure, Canberra
Claire Trenham: National Computational Infrastructure, Canberra
Lesley Wyborn: National Computational Infrastructure, Canberra

DOI: https://doi.org/10.5334/dsj-2017-020
Journal volume & issue: Vol. 16

Abstract

Read online

The National Computational Infrastructure (NCI) manages over 10 PB research data, which is co-located with the high performance computer (Raijin) and an HPC class 3000 core OpenStack cloud system (Tenjin). In support of this integrated High Performance Computing/High Performance Data (HPC/HPD) infrastructure, NCI’s data management practices includes building catalogues, DOI minting, data curation, data publishing, and data delivery through a variety of data services. The metadata catalogues, DOIs, THREDDS, and Vocabularies, all use different Uniform Resource Locator (URL) styles. A Persistent IDentifier (PID) service provides an important utility to manage URLs in a consistent, controlled and monitored manner to support the robustness of our national ‘Big Data’ infrastructure. In this paper we demonstrate NCI’s approach of utilising the NCI’s 'PID Service 'to consistently manage its persistent identifiers with various applications.

Published in Data Science Journal

ISSN: 1683-1470 (Online)
Publisher: Ubiquity Press
Country of publisher: United Kingdom
LCC subjects: Science: Science (General)
Website: http://datascience.codata.org/

About the journal

Abstract

Keywords