NIFL: A Statistical Measures-Based Method for Client Selection in Federated Learning

M'haouach Mohamed; Anass Houdou; Hamza Alami; Khalid Fardousse; Ismail Berrada

doi:10.1109/ACCESS.2022.3225407

IEEE Access (Jan 2022)

NIFL: A Statistical Measures-Based Method for Client Selection in Federated Learning

M'haouach Mohamed,
Anass Houdou,
Hamza Alami,
Khalid Fardousse,
Ismail Berrada

Affiliations

M'haouach Mohamed: ORCiD; Sidi Mohamed Ben Abdellah University (USMBA), Fez, Morocco
Anass Houdou: International School of Public Health, Mohammed VI University of Health Sciences, Casablanca, Morocco
Hamza Alami: School of Computer Science, Mohammed VI Polytechnic University, Ben Guerir, Morocco
Khalid Fardousse: ORCiD; Sidi Mohamed Ben Abdellah University (USMBA), Fez, Morocco
Ismail Berrada: School of Computer Science, Mohammed VI Polytechnic University, Ben Guerir, Morocco

DOI: https://doi.org/10.1109/ACCESS.2022.3225407
Journal volume & issue: Vol. 10
pp. 124766 – 124776

Abstract

Read online

Federated learning (FL) has been proposed as a machine learning approach to collaboratively learn a shared prediction model. Although, during FL training, only a subset of workers participate in each round, existing approaches introduce model bias when considering the average of local model parameters of heterogeneous workers, which degrades the accuracy of the learned global model. In this paper, we introduce NIFL, a new strategy for worker selection that handles the statistical challenges of FL when local data is Non-Independent and Identically Distributed (N-IID). In NIFL, the server starts sending the signal to the workers that react by sending the number of their samples. The server then selects a percentage of workers with the highest number of samples and requests data statistics such as mean and standard deviation. After that, the server calculates our proposed N-IID index, based on the statistical information collected from the workers without having access to their data, and uses this index as a criterion for worker selection. Finally, the server broadcasts the global model to the selected workers. NIFL takes into account the disparity in the distribution of workers’ data in order to improve the performance of the model in heterogeneous data environment. We have performed several experiments with N-IID data. The obtained results show that both the convergence of our method and the test accuracy increased considerably comparing to the other techniques while keeping a reasonable computation and communication costs.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords