Federated difference-in-differences with multiple time periods in DataSHIELD
Manuel Huth,
Carolina Alvarez Garavito,
Lea Seep,
Laia Cirera,
Francisco Saúte,
Elisa Sicuri,
Jan Hasenauer
Affiliations
Manuel Huth
Institute for Computational Biology, Helmholtz Munich - German Research Center for Environmental Health, Munich, Germany; LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Carolina Alvarez Garavito
LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Lea Seep
LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany
Laia Cirera
ISGlobal, Barcelona, Spain
Francisco Saúte
Centro de Investigação em Saúde de Manhiça, Manhiça, Mozambique
Elisa Sicuri
ISGlobal, Barcelona, Spain; Centro de Investigação em Saúde de Manhiça, Manhiça, Mozambique; LSE Health - Department of Health Policy, London School of Economics and Political Science, London, UK; Facultat de Medicina i Ciències de la Salut, Universitat de Barcelona, Barcelona, Spain
Jan Hasenauer
Institute for Computational Biology, Helmholtz Munich - German Research Center for Environmental Health, Munich, Germany; LIMES, Faculty of Mathematics and Natural Sciences, University of Bonn, Bonn, Germany; Corresponding author
Summary: Difference-in-differences (DID) is a key tool for causal impact evaluation but faces challenges when applied to sensitive data restricted by privacy regulations. Obtaining consent can shrink sample sizes and reduce statistical power, limiting the analysis’s effectiveness. Federated learning addresses these issues by sharing aggregated statistics rather than individual data, though advanced federated DID software is limited. We developed a federated version of the Callaway and Sant’Anna difference-in-differences (CSDID), integrated into the DataSHIELD platform, adhering to stringent privacy protocols. Our approach reproduces key estimates and standard errors while preserving confidentiality. Using simulated and real-world data from a malaria intervention in Mozambique, we demonstrate that federated estimates increase sample sizes, reduce estimation uncertainty, and enable analyses when data owners cannot share treated or untreated group data. Our work contributes to facilitating the evaluation of policy interventions or treatments across centers and borders.