Bausteine Forschungsdatenmanagement (May 2023)
Das Forschungsdatenzentrum der Universität Hamburg
Abstract
The more recent discussion of research data practices at relevant conferences, workshops and respective publications suggest substantially different foci of problems and solutions in managing data between scientific disciplines. There seems to be a particularly profound gap in natural science and humanities whereas social and life sciences are placed somewhere in between. Indeed data centers tailored to the specific needs of a single discipline (physics, chemistry, climate studies) are numerous in science and tend to be nearly absent for a specific humanities subject. While the former ask for and report solutions on scaling up (larger quantities of data can be run by the same application) and scaling out (larger quantities of data can use the same infrastructure), the latter are concerned with the heterogeneity of relatively small amounts of data (long-tail problem) and a divergence of agreed standards; something we may term as cross scaling. In either case, an efficiency problem has to be solved. On the one hand, huge amounts of data have to be handled within an acceptable time frame, on the other hand, many different applications with diverse functionalities have to be handled with an acceptable number of resources. We would like to argue here that independent from the discipline either optimization problem should be addressed. Throughout the last decade, we have also observed that projects in science diversify and prefer individualized solutions which additionally hints at increasing data heterogeneity in natural science as well while, at the same time, some humanities projects produce petabytes of data. To show the necessity of a differentiated approach, the research data center of Universität Hamburg is offered as a case in point. The evolution of the center specialized in humanities projects to a research data center offering services for the whole university whereas other disciplinary data centers continue to exist side by side illustrates the entire range of tasks of data stewardship. It includes the continuous development of services while getting more and more involved in natural science projects as well as task sharing and communication with other data institutions. A core asset to understand the requirements of each discipline is a multidisciplinary team. Yet, the main organizing principle of the offered services centers around the stages of the data life cycle (1. data creation and deposit, 2. managing active data, 3. data repositories and archives, 4. data catalog and registries). The interdigitation of these stages is paramount in the long term strategy.
Keywords