Computational and Structural Biotechnology Journal (Dec 2024)

Privacy-preserving federated machine learning on FAIR health data: A real-world application

  • A. Anil Sinaci,
  • Mert Gencturk,
  • Celia Alvarez-Romero,
  • Gokce Banu Laleci Erturkmen,
  • Alicia Martinez-Garcia,
  • María José Escalona-Cuaresma,
  • Carlos Luis Parra-Calderon

Journal volume & issue
Vol. 24
pp. 136 – 145

Abstract

Read online

Objective: This paper introduces a privacy-preserving federated machine learning (ML) architecture built upon Findable, Accessible, Interoperable, and Reusable (FAIR) health data. It aims to devise an architecture for executing classification algorithms in a federated manner, enabling collaborative model-building among health data owners without sharing their datasets. Materials and methods: Utilizing an agent-based architecture, a privacy-preserving federated ML algorithm was developed to create a global predictive model from various local models. This involved formally defining the algorithm in two steps: data preparation and federated model training on FAIR health data and constructing the architecture with multiple components facilitating algorithm execution. The solution was validated by five healthcare organizations using their specific health datasets. Results: Five organizations transformed their datasets into Health Level 7 Fast Healthcare Interoperability Resources via a common FAIRification workflow and software set, thereby generating FAIR datasets. Each organization deployed a Federated ML Agent within its secure network, connected to a cloud-based Federated ML Manager. System testing was conducted on a use case aiming to predict 30-day readmission risk for chronic obstructive pulmonary disease patients and the federated model achieved an accuracy rate of 87%. Discussion: The paper demonstrated a practical application of privacy-preserving federated ML among five distinct healthcare entities, highlighting the value of FAIR health data in machine learning when utilized in a federated manner that ensures privacy protection without sharing data. Conclusion: This solution effectively leverages FAIR datasets from multiple healthcare organizations for federated ML while safeguarding sensitive health datasets, meeting legislative privacy and security requirements.

Keywords