Knowledge abstraction and filtering based federated learning over heterogeneous data views in healthcare

Anshul Thakur; Soheila Molaei; Pafue Christy Nganjimi; Fenglin Liu; Andrew Soltan; Patrick Schwab; Kim Branson; David A. Clifton

doi:10.1038/s41746-024-01272-9

npj Digital Medicine (Oct 2024)

Knowledge abstraction and filtering based federated learning over heterogeneous data views in healthcare

Anshul Thakur,
Soheila Molaei,
Pafue Christy Nganjimi,
Fenglin Liu,
Andrew Soltan,
Patrick Schwab,
Kim Branson,
David A. Clifton

Affiliations

Anshul Thakur: Department of Engineering Science, University of Oxford
Soheila Molaei: Department of Engineering Science, University of Oxford
Pafue Christy Nganjimi: Department of Engineering Science, University of Oxford
Fenglin Liu: Department of Engineering Science, University of Oxford
Andrew Soltan: Department of Engineering Science, University of Oxford
Patrick Schwab: GlaxoSmithKline
Kim Branson: GlaxoSmithKline
David A. Clifton: Department of Engineering Science, University of Oxford

DOI: https://doi.org/10.1038/s41746-024-01272-9
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Robust data privacy regulations hinder the exchange of healthcare data among institutions, crucial for global insights and developing generalised clinical models. Federated learning (FL) is ideal for training global models using datasets from different institutions without compromising privacy. However, disparities in electronic healthcare records (EHRs) lead to inconsistencies in ML-ready data views, making FL challenging without extensive preprocessing and information loss. These differences arise from variations in services, care standards, and record-keeping practices. This paper addresses data view heterogeneity by introducing a knowledge abstraction and filtering-based FL framework that allows FL over heterogeneous data views without manual alignment or information loss. The knowledge abstraction and filtering mechanism maps raw input representations to a unified, semantically rich shared space for effective global model training. Experiments on three healthcare datasets demonstrate the framework’s effectiveness in overcoming data view heterogeneity and facilitating information sharing in a federated setup.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal