JMIR Medical Informatics (Jan 2021)

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

  • Vaid, Akhil,
  • Jaladanki, Suraj K,
  • Xu, Jie,
  • Teng, Shelly,
  • Kumar, Arvind,
  • Lee, Samuel,
  • Somani, Sulaiman,
  • Paranjpe, Ishan,
  • De Freitas, Jessica K,
  • Wanyan, Tingyi,
  • Johnson, Kipp W,
  • Bicak, Mesude,
  • Klang, Eyal,
  • Kwon, Young Joon,
  • Costa, Anthony,
  • Zhao, Shan,
  • Miotto, Riccardo,
  • Charney, Alexander W,
  • Böttinger, Erwin,
  • Fayad, Zahi A,
  • Nadkarni, Girish N,
  • Wang, Fei,
  • Glicksberg, Benjamin S

DOI
https://doi.org/10.2196/24207
Journal volume & issue
Vol. 9, no. 1
p. e24207

Abstract

Read online

BackgroundMachine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability. ObjectiveWe aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days. MethodsPatient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator. ResultsThe LASSOfederated model outperformed the LASSOlocal model at 3 hospitals, and the MLPfederated model performed better than the MLPlocal model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSOpooled model outperformed the LASSOfederated model at all hospitals, and the MLPfederated model outperformed the MLPpooled model at 2 hospitals. ConclusionsThe federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.