Federated difference-in-differences with multiple time periods in DataSHIELD

Manuel Huth; Carolina Alvarez Garavito; Lea Seep; Laia Cirera; Francisco Saúte; Elisa Sicuri; Jan Hasenauer

iScience (Nov 2024)

Federated difference-in-differences with multiple time periods in DataSHIELD

Manuel Huth,
Carolina Alvarez Garavito,
Lea Seep,
Laia Cirera,
Francisco Saúte,
Elisa Sicuri,
Jan Hasenauer

Affiliations

Manuel Huth: Institute for Computational Biology, Helmholtz Munich - German Research Center for Environmental Health, Munich, Germany; LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Carolina Alvarez Garavito: LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Lea Seep: LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Laia Cirera: ISGlobal, Barcelona, Spain
Francisco Saúte: Centro de Investigação em Saúde de Manhiça, Manhiça, Mozambique
Elisa Sicuri: ISGlobal, Barcelona, Spain; Centro de Investigação em Saúde de Manhiça, Manhiça, Mozambique; LSE Health - Department of Health Policy, London School of Economics and Political Science, London, UK; Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
Jan Hasenauer: Institute for Computational Biology, Helmholtz Munich - German Research Center for Environmental Health, Munich, Germany; LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany; Corresponding author

Journal volume & issue: Vol. 27, no. 11
p. 111025

Abstract

Read online

Summary: Difference-in-differences (DID) is a key tool for causal impact evaluation but faces challenges when applied to sensitive data restricted by privacy regulations. Obtaining consent can shrink sample sizes and reduce statistical power, limiting the analysis’s effectiveness. Federated learning addresses these issues by sharing aggregated statistics rather than individual data, though advanced federated DID software is limited. We developed a federated version of the Callaway and Sant’Anna difference-in-differences (CSDID), integrated into the DataSHIELD platform, adhering to stringent privacy protocols. Our approach reproduces key estimates and standard errors while preserving confidentiality. Using simulated and real-world data from a malaria intervention in Mozambique, we demonstrate that federated estimates increase sample sizes, reduce estimation uncertainty, and enable analyses when data owners cannot share treated or untreated group data. Our work contributes to facilitating the evaluation of policy interventions or treatments across centers and borders.

Published in iScience

ISSN: 2589-0042 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science
Website: http://www.cell.com/iscience/home

About the journal

Abstract

Keywords