COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data

Qiong Wu; Jenna M. Reps; Lu Li; Bingyu Zhang; Yiwen Lu; Jiayi Tong; Dazheng Zhang; Thomas Lumley; Milou T. Brand; Mui Van Zandt; Thomas Falconer; Xing He; Yu Huang; Haoyang Li; Chao Yan; Guojun Tang; Andrew E. Williams; Fei Wang; Jiang Bian; Bradley Malin; George Hripcsak; Martijn J. Schuemie; Yun Lu; Steve Drew; Jiayu Zhou; David A. Asch; Yong Chen

doi:10.1038/s41746-025-01781-1

npj Digital Medicine (Jul 2025)

COLA-GLM: collaborative one-shot and lossless algorithms of generalized linear models for decentralized observational healthcare data

Qiong Wu,
Jenna M. Reps,
Lu Li,
Bingyu Zhang,
Yiwen Lu,
Jiayi Tong,
Dazheng Zhang,
Thomas Lumley,
Milou T. Brand,
Mui Van Zandt,
Thomas Falconer,
Xing He,
Yu Huang,
Haoyang Li,
Chao Yan,
Guojun Tang,
Andrew E. Williams,
Fei Wang,
Jiang Bian,
Bradley Malin,
George Hripcsak,
Martijn J. Schuemie,
Yun Lu,
Steve Drew,
Jiayu Zhou,
David A. Asch,
Yong Chen

Affiliations

Qiong Wu: Department of Biostatistics and Health Data Science, University of Pittsburgh
Jenna M. Reps: Observational Health Data Sciences and Informatics
Lu Li: The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania
Bingyu Zhang: The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania
Yiwen Lu: The Center for Health AI and Synthesis of Evidence (CHASE), University of Pennsylvania
Jiayi Tong: Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine
Dazheng Zhang: Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine
Thomas Lumley: Department of Statistics, Faculty of Science, University of Auckland
Milou T. Brand: Real World Solutions, IQVIA
Mui Van Zandt: Observational Health Data Sciences and Informatics
Thomas Falconer: Department of Biomedical Informatics, Columbia University Irving Medical Center
Xing He: Department of Biostatistics and Health Data Science, Indiana University
Yu Huang: Department of Biostatistics and Health Data Science, Indiana University
Haoyang Li: Department of Population Health Sciences, Weill Cornell Medicine
Chao Yan: Department of Biomedical Informatics, Vanderbilt University Medical Center
Guojun Tang: Department of Electrical and Software Engineering, University of Calgary
Andrew E. Williams: Clinical and Translational Science Institute, Tufts Medical Center
Fei Wang: Department of Population Health Sciences, Weill Cornell Medicine
Jiang Bian: Department of Biostatistics and Health Data Science, Indiana University
Bradley Malin: Department of Biomedical Informatics, Vanderbilt University Medical Center
George Hripcsak: Department of Biomedical Informatics, Columbia University Irving Medical Center
Martijn J. Schuemie: Observational Health Data Sciences and Informatics
Yun Lu: Center for Biologics Evaluation and Research, Food and Drug Administration
Steve Drew: Department of Electrical and Software Engineering, University of Calgary
Jiayu Zhou: School of Information, University of Michigan
David A. Asch: Leonard Davis Institute of Health Economics, University of Pennsylvania
Yong Chen: Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine

DOI: https://doi.org/10.1038/s41746-025-01781-1
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Clinical insights from real-world data often require aggregating information from institutions to ensure sufficient sample sizes and generalizability. However, patient privacy concerns only limit the sharing of patient-level data, and traditional federated learning algorithms, relying on extensive back-and-forth communications, can be inefficient to implement. We introduce the Collaborative One-shot Lossless Algorithm for Generalized Linear Models (COLA-GLM), a novel federated learning algorithm that supports diverse outcome types via generalized linear models and achieves results identical to a pooled patient-level data analysis (lossless) with only a single round of aggregated data exchange (one-shot). To further protect aggregated institutional data, we developed a secure extension, secure-COLA-GLM, utilizing homomorphic encryption. We demonstrated the effectiveness and lossless property of COLA-GLM through applications to an international influenza cohort and a decentralized U.S. COVID-19 mortality study. COLA-GLM and secure-COLA-GLM offer a scalable, efficient solution for decentralized collaborative learning involving multiple data partners and diverse security requirements.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal