Federated learning for preserving data privacy in collaborative healthcare research

Tyler J Loftus; Matthew M Ruppert; Benjamin Shickel; Tezcan Ozrazgat-Baslanti; Jeremy A Balch; Philip A Efron; Gilbert R Upchurch; Parisa Rashidi; Christopher Tignanelli; Jiang Bian; Azra Bihorac

doi:10.1177/20552076221134455

Digital Health (Oct 2022)

Federated learning for preserving data privacy in collaborative healthcare research

Tyler J Loftus,
Matthew M Ruppert,
Benjamin Shickel,
Tezcan Ozrazgat-Baslanti,
Jeremy A Balch,
Philip A Efron,
Gilbert R Upchurch,
Parisa Rashidi,
Christopher Tignanelli,
Jiang Bian,
Azra Bihorac

Affiliations

Tyler J Loftus: University of Florida, Intelligent Critical Care Center, Gainesville, FL, USA
Matthew M Ruppert: Department of Medicine, , Gainesville, FL, USA
Benjamin Shickel: Department of Biomedical Engineering, , Gainesville, FL, USA
Tezcan Ozrazgat-Baslanti: Department of Medicine, , Gainesville, FL, USA
Jeremy A Balch: University of Florida, Intelligent Critical Care Center, Gainesville, FL, USA
Philip A Efron: Department of Surgery, , Gainesville, FL, USA
Gilbert R Upchurch: Department of Surgery, , Gainesville, FL, USA
Parisa Rashidi: Departments of Biomedical Engineering, Computer and Information Science and Engineering, and Electrical and Computer Engineering, , Gainesville, FL, USA
Christopher Tignanelli: Department Surgery, University of Minnesota, Minneapolis, MN, USA
Jiang Bian: Department of Health Outcomes and Biomedical Informatics, , Gainesville, FL, USA
Azra Bihorac: University of Florida, Intelligent Critical Care Center, Gainesville, FL, USA

DOI: https://doi.org/10.1177/20552076221134455
Journal volume & issue: Vol. 8

Abstract

Read online

Generalizability, external validity, and reproducibility are high priorities for artificial intelligence applications in healthcare. Traditional approaches to addressing these elements involve sharing patient data between institutions or practice settings, which can compromise data privacy (individuals’ right to prevent the sharing and disclosure of information about themselves) and data security (simultaneously preserving confidentiality, accuracy, fidelity, and availability of data). This article describes insights from real-world implementation of federated learning techniques that offer opportunities to maintain both data privacy and availability via collaborative machine learning that shares knowledge, not data. Local models are trained separately on local data. As they train, they send local model updates (e.g. coefficients or gradients) for consolidation into a global model. In some use cases, global models outperform local models on new, previously unseen local datasets, suggesting that collaborative learning from a greater number of examples, including a greater number of rare cases, may improve predictive performance. Even when sharing model updates rather than data, privacy leakage can occur when adversaries perform property or membership inference attacks which can be used to ascertain information about the training set. Emerging techniques mitigate risk from adversarial attacks, allowing investigators to maintain both data privacy and availability in collaborative healthcare research. When data heterogeneity between participating centers is high, personalized algorithms may offer greater generalizability by improving performance on data from centers with proportionately smaller training sample sizes. Properly applied, federated learning has the potential to optimize the reproducibility and performance of collaborative learning while preserving data security and privacy.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal