International Journal of Digital Curation (Jul 2017)
Preserving Transactional Data
Abstract
This paper is an adaptation of a longer report commissioned by the UK Data Service. The longer report contributes to on-going support for the Big Data Network – a programme funded by the Economic and Social Research Council (ESRC). The longer report can be found at doi:10.7207/twr16-02. This paper discusses requirements for preserving transactional data and the accompanying challenges facing the companies and institutions who aim to re-use these data for analysis or research. It presents a range of use cases – examples of transactional data – in order to describe the characteristics and difficulties of these ‘big’ data for long-term access. Based on the overarching trends discerned in these use cases, the paper will define the challenges facing the preservation of these data early in the curation lifecycle. It will point to potential solutions within current legal and ethical frameworks, but will focus on positioning the problem of re-using these data from a preservation perspective. In some contexts, these data could be fiscal in nature, deriving from business ‘transactions’. This paper, however, considers transactional data more broadly, addressing any data generated through interactions with a database system. Administrative data, for instance, is one important form of transactional data collected primarily for operational purposes, not for research. Examples of administrative data include information collected by government departments and other organisations when delivering a service (e.g. tax, health, or education) and can entail significant legal and ethical challenges for re-use. Transactional data, whether created by interactions between government database systems and citizens or by automatic sensors or machines, hold potential for future developments in academic research and consumer analytics. Re-use of reliable transactional data in research has the power to improve services and investments by organisations in many different sectors. Ultimately, however, these data will only lead to new discoveries and insights if they are effectively curated and preserved to ensure appropriate reproducibility. This paper explores challenges to this undertaking and approaches to ensuring long-term access. Â