Benchmarking PySyft Federated Learning Framework on MIMIC-III Dataset

Andrius Budrionis; Magda Miara; Piotr Miara; Szymon Wilk; Johan Gustav Bellika

doi:10.1109/ACCESS.2021.3105929

IEEE Access (Jan 2021)

Benchmarking PySyft Federated Learning Framework on MIMIC-III Dataset

Andrius Budrionis,
Magda Miara,
Piotr Miara,
Szymon Wilk,
Johan Gustav Bellika

Affiliations

Andrius Budrionis: ORCiD; Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway
Magda Miara: Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland
Piotr Miara: Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland
Szymon Wilk: ORCiD; Faculty of Computing and Telecommunications, Poznan University of Technology, Poznan, Poland
Johan Gustav Bellika: Norwegian Centre for E-health Research, University Hospital of North Norway, Tromsø, Norway

DOI: https://doi.org/10.1109/ACCESS.2021.3105929
Journal volume & issue: Vol. 9
pp. 116869 – 116878

Abstract

Read online

The adoption of the advanced data analytics methods has been limited in industries governed by strict data reuse regulations, such as healthcare. Barriers to data access and sharing have affected numerous research and development initiatives in healthcare resulting in major delays, extensive use of resources for data access and findings originating from datasets that are too small to be generalizable. Federated machine learning presents a solution to the problems health data analytics projects are facing by providing a way of complying with strict regulatory requirements without sacrificing privacy. Computing frameworks supporting federated machine learning are still in their infancy and their performance in realistic settings has been studied only to a limited extent. To expand the existing knowledge on federated learning in realistic deployment settings three groups of experiments comparing the performance of a neural network-based model trained in federated manner to that of an equivalent baseline model trained on centralized data storage were designed. Experiments were conducted on the MIMIC-III dataset and modelled a binary classification problem predicting in-hospital mortality. The effect that varying amounts of data, number of computational nodes, and data distribution in the federated network had on model performance and on training and inference durations were studied. Experiments demonstrated predictive performance comparable to that of the baseline for models trained in federated settings in terms of area under the ROC and F1 scores. Data distribution across computing nodes showed minimal to no effect on model performance or on training and inference durations. However, federated model training and inference took approximately 9 and 40 times longer, respectively, than the equivalent tasks executed in centralized settings. These results indicate that federated learning is a viable solution for enabling advanced data analytics in environments regulated by strict privacy requirements.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords