Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data

Daniel Scheliga; Patrick Mäder; Marco Seeland

doi:10.1080/08839514.2024.2394756

Applied Artificial Intelligence (Dec 2024)

Feature-Based Dataset Fingerprinting for Clustered Federated Learning on Medical Image Data

Daniel Scheliga,
Patrick Mäder,
Marco Seeland

Affiliations

Daniel Scheliga: Department of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, Germany
Patrick Mäder: Department of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, Germany
Marco Seeland: Department of Computer Science and Automation, Data-intensive Systems and Visualization Group (dAI.SY), Technische Universität Ilmenau, Ilmenau, Germany

DOI: https://doi.org/10.1080/08839514.2024.2394756
Journal volume & issue: Vol. 38, no. 1

Abstract

Read online

Federated Learning (FL) allows multiple clients to train a common model without sharing their private training data. In practice, federated optimization struggles with sub-optimal model utility because data is not independent and identically distributed (non-IID). Recent work has proposed to cluster clients according to dataset fingerprints to improve model utility in such situations. These fingerprints aim to capture the key characteristics of clients’ local data distributions. Recently, a mechanism was proposed to calculate dataset fingerprints from raw client data. We find that this fingerprinting mechanism comes with substantial time and memory consumption, limiting its practical use to small datasets. Additionally, shared raw data fingerprints can directly leak sensitive visual information, in certain cases even resembling the original client training data. To alleviate these problems, we propose a Feature-based dataset FingerPrinting mechanism (FFP). We use the MedMNIST database to develop a highly realistic case study for FL on medical image data. Compared to existing methods, our proposed FFP reduces the computational overhead of fingerprint calculation while achieving similar model utility. Furthermore, FFP mitigates the risk of raw data leakage from fingerprints by design.

Published in Applied Artificial Intelligence

ISSN: 0883-9514 (Print); 1087-6545 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Science: Science (General): Cybernetics
Website: https://www.tandfonline.com/journals/uaai

About the journal