Securing a Local Training Dataset Size in Federated Learning

Young Ah Shin; Geontae Noh; Ik Rae Jeong; Ji Young Chun

doi:10.1109/ACCESS.2022.3210702

IEEE Access (Jan 2022)

Securing a Local Training Dataset Size in Federated Learning

Young Ah Shin,
Geontae Noh,
Ik Rae Jeong,
Ji Young Chun

Affiliations

Young Ah Shin: ORCiD; Graduate School of Information Security, Korea University, Seoul, South Korea
Geontae Noh: ORCiD; Department of Big Data and Information Security, Seoul Cyber University, Seoul, South Korea
Ik Rae Jeong: ORCiD; Graduate School of Information Security, Korea University, Seoul, South Korea
Ji Young Chun: ORCiD; Department of Big Data and Information Security, Seoul Cyber University, Seoul, South Korea

DOI: https://doi.org/10.1109/ACCESS.2022.3210702
Journal volume & issue: Vol. 10
pp. 104135 – 104143

Abstract

Read online

Federated learning (FL) is an emerging paradigm that helps to train a global machine learning (ML) model by utilizing decentralized data among clients without sharing them. Although FL is a more secure way of model training than a general ML, industries where training data are primarily personal information, such as MRI images or Electronic Health Records (EHR), should be more precautious of privacy and security issues when using FL. For example, unbalanced dataset sizes may denote some meaningful information that can lead to security vulnerabilities even if the training data of the clients are not exposed. In this paper, we present a Privacy-Preserving Federated Averaging ( $\mathbf {PP-FedAvg}$ ) protocol specialized for healthcare settings to limit user data privacy leakage in FL. We particularly protect the size of datasets as well as the aggregated local update parameters by securely computing among clients based on homomorphic encryption. This approach ensures that the server does not access the size of datasets and local update parameters while updating the global model. Our protocol has the advantage of protecting the size of datasets when datasets are not uniformly distributed among clients and when some clients drop out each iteration.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords