An overview on synthetic administrative data for research

Theodora Kokosi; Bianca De Stavola; Robin Mitra; Lora Frayling; Aiden Doherty; Iain Dove; Pam Sonnenberg; Katie Harron

doi:10.23889/ijpds.v7i1.1727

International Journal of Population Data Science (May 2022)

An overview on synthetic administrative data for research

Theodora Kokosi,
Bianca De Stavola,
Robin Mitra,
Lora Frayling,
Aiden Doherty,
Iain Dove,
Pam Sonnenberg,
Katie Harron

Affiliations

Theodora Kokosi: Department of Population, Policy and Practice, UCL Great Ormond Street Institute of Child Health, University College London, London, UK
Bianca De Stavola: Department of Population, Policy and Practice, UCL Great Ormond Street Institute of Child Health, University College London, London, UK
Robin Mitra: School of Mathematics, Cardiff University, Cardiff UK
Lora Frayling: Health Data Insight, UK
Aiden Doherty: Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK Nuffield Department of Population Health, University of Oxford, Oxford, UK
Iain Dove: Office for National Statistics, Titchfield, UK
Pam Sonnenberg: Department of Infection & Population Health, Institute for Global Health, University College London, London, UK
Katie Harron: Department of Population, Policy and Practice, UCL Great Ormond Street Institute of Child Health, University College London, London, UK

DOI: https://doi.org/10.23889/ijpds.v7i1.1727
Journal volume & issue: Vol. 7, no. 1

Abstract

Read online

Use of administrative data for research and for planning services has increased over recent decades due to the value of the large, rich information available. However, concerns about the release of sensitive or personal data and the associated disclosure risk can lead to lengthy approval processes and restricted data access. This can delay or prevent the production of timely evidence. A promising solution to facilitate more efficient data access is to create synthetic versions of the original datasets which do not hold any confidential information and can minimise disclosure risk. Such data may be used as an interim solution, allowing researchers to develop their analysis plans on non-disclosive data, whilst waiting for access to the real data. We aim to provide an overview of the background and uses of synthetic data, describe common methods used to generate synthetic data in the context of UK administrative research, propose a simplified terminology for categories of synthetic data, and illustrate challenges and future directions for research.

Published in International Journal of Population Data Science

ISSN: 2399-4908 (Online)
Publisher: Swansea University
Country of publisher: United Kingdom
LCC subjects: Social Sciences: Economic theory. Demography: Demography. Population. Vital events
Website: https://ijpds.org

About the journal

Abstract

Keywords