Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients

Carolin E. M. Koll; Sina M. Hopff; Thierry Meurers; Chin Huang Lee; Mirjam Kohls; Christoph Stellbrink; Charlotte Thibeault; Lennart Reinke; Sarah Steinbrecher; Stefan Schreiber; Lazar Mitrov; Sandra Frank; Olga Miljukov; Johanna Erber; Johannes C. Hellmuth; Jens-Peter Reese; Fridolin Steinbeis; Thomas Bahmer; Marina Hagen; Patrick Meybohm; Stefan Hansch; István Vadász; Lilian Krist; Steffi Jiru-Hillmann; Fabian Prasser; Jörg Janne Vehreschild; NAPKON Study Group

doi:10.1038/s41597-022-01669-9

Scientific Data (Dec 2022)

Statistical biases due to anonymization evaluated in an open clinical dataset from COVID-19 patients

Carolin E. M. Koll,
Sina M. Hopff,
Thierry Meurers,
Chin Huang Lee,
Mirjam Kohls,
Christoph Stellbrink,
Charlotte Thibeault,
Lennart Reinke,
Sarah Steinbrecher,
Stefan Schreiber,
Lazar Mitrov,
Sandra Frank,
Olga Miljukov,
Johanna Erber,
Johannes C. Hellmuth,
Jens-Peter Reese,
Fridolin Steinbeis,
Thomas Bahmer,
Marina Hagen,
Patrick Meybohm,
Stefan Hansch,
István Vadász,
Lilian Krist,
Steffi Jiru-Hillmann,
Fabian Prasser,
Jörg Janne Vehreschild,
NAPKON Study Group

Affiliations

Carolin E. M. Koll: University of Cologne, Faculty of Medicine and University Hospital Cologne, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf
Sina M. Hopff: University of Cologne, Faculty of Medicine and University Hospital Cologne, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf
Thierry Meurers: Berlin Institute of Health at Charité – Universitätsmedizin Berlin
Chin Huang Lee: University of Cologne, Faculty of Medicine and University Hospital Cologne, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf
Mirjam Kohls: University of Wuerzburg, Faculty of Medicine, Institute for Clinical Epidemiology and Biometry
Christoph Stellbrink: Department of Cardiology and Intensive Care Medicine, Bielefeld Medical Centre, Medical Faculty OWL, University of Bielefeld
Charlotte Thibeault: Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin
Lennart Reinke: Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel
Sarah Steinbrecher: Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin
Stefan Schreiber: Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel
Lazar Mitrov: University of Cologne, Faculty of Medicine and University Hospital Cologne, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf
Sandra Frank: Department of Anesthesiology, University Hospital of Ludwig-Maximilians-University (LMU)
Olga Miljukov: University of Wuerzburg, Faculty of Medicine, Institute for Clinical Epidemiology and Biometry
Johanna Erber: Technical University of Munich, School of Medicine, University Hospital rechts der Isar, Department of Internal Medicine II
Johannes C. Hellmuth: Department of Medicine III, University Hospital, LMU Munich
Jens-Peter Reese: University of Wuerzburg, Faculty of Medicine, Institute for Clinical Epidemiology and Biometry
Fridolin Steinbeis: Charité – Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt Universität zu Berlin
Thomas Bahmer: Internal Medicine Department I, University Medical Center Schleswig-Holstein Campus Kiel
Marina Hagen: Department II for Internal Medicine, Hematology/Oncology, University Hospital Frankfurt
Patrick Meybohm: Department of Anaesthesiology, Intensive Care, Emergency and Pain Medicine, University Hospital Wuerzburg
Stefan Hansch: Department of Infection Prevention and Infectious Diseases, University Hospital Regensburg
István Vadász: Department of Internal Medicine, Justus Liebig University, Universities of Giessen and Marburg Lung Center (UGMLC), Member of the German Center for Lung Research (DZL)
Lilian Krist: Institute of Social Medicine, Epidemiology and Health Economics, Charité-Universitätsmedizin Berlin
Steffi Jiru-Hillmann: University of Wuerzburg, Faculty of Medicine, Institute for Clinical Epidemiology and Biometry
Fabian Prasser: Berlin Institute of Health at Charité – Universitätsmedizin Berlin
Jörg Janne Vehreschild: University of Cologne, Faculty of Medicine and University Hospital Cologne, Department I of Internal Medicine, Center for Integrated Oncology Aachen Bonn Cologne Duesseldorf
NAPKON Study Group

DOI: https://doi.org/10.1038/s41597-022-01669-9
Journal volume & issue: Vol. 9, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Anonymization has the potential to foster the sharing of medical data. State-of-the-art methods use mathematical models to modify data to reduce privacy risks. However, the degree of protection must be balanced against the impact on statistical properties. We studied an extreme case of this trade-off: the statistical validity of an open medical dataset based on the German National Pandemic Cohort Network (NAPKON), which was prepared for publication using a strong anonymization procedure. Descriptive statistics and results of regression analyses were compared before and after anonymization of multiple variants of the original dataset. Despite significant differences in value distributions, the statistical bias was found to be small in all cases. In the regression analyses, the median absolute deviations of the estimated adjusted odds ratios for different sample sizes ranged from 0.01 [minimum = 0, maximum = 0.58] to 0.52 [minimum = 0.25, maximum = 0.91]. Disproportionate impact on the statistical properties of data is a common argument against the use of anonymization. Our analysis demonstrates that anonymization can actually preserve validity of statistical results in relatively low-dimensional data.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal