Patterns (May 2022)

Privacy-preserving federated neural network learning for disease-associated cell classification

  • Sinem Sav,
  • Jean-Philippe Bossuat,
  • Juan R. Troncoso-Pastoriza,
  • Manfred Claassen,
  • Jean-Pierre Hubaux

Journal volume & issue
Vol. 3, no. 5
p. 100487

Abstract

Read online

Summary: Training accurate and robust machine learning models requires a large amount of data that is usually scattered across data silos. Sharing or centralizing the data of different healthcare institutions is, however, unfeasible or prohibitively difficult due to privacy regulations. In this work, we address this problem by using a privacy-preserving federated learning-based approach, PriCell, for complex models such as convolutional neural networks. PriCell relies on multiparty homomorphic encryption and enables the collaborative training of encrypted neural networks with multiple healthcare institutions. We preserve the confidentiality of each institutions’ input data, of any intermediate values, and of the trained model parameters. We efficiently replicate the training of a published state-of-the-art convolutional neural network architecture in a decentralized and privacy-preserving manner. Our solution achieves an accuracy comparable with the one obtained with the centralized non-secure solution. PriCell guarantees patient privacy and ensures data utility for efficient multi-center studies involving complex healthcare data. The bigger picture: High-quality medical machine learning models will benefit greatly from collaboration between health care institutions. Yet, it is usually difficult to transfer data between these institutions due to strict privacy regulations. In this study, we propose a solution, PriCell, that relies on multiparty homomorphic encryption to enable privacy-preserving collaborative machine learning while protecting via encryption the institutions' input data, the model, and any value exchanged between the institutions. We show the maturity of our solution by training a published state-of-the-art convolutional neural network in a decentralized and privacy-preserving manner. We compare the accuracy achieved by PriCell with the centralized and non-secure solutions and show that PriCell guarantees privacy without reducing the utility of the data. The benefits of PriCell constitute an important landmark for real-world applications of collaborative training while preserving privacy.

Keywords