IEEE Access (Jan 2024)
FedDSL: A Novel Client Selection Method to Handle Statistical Heterogeneity in Cross-Silo Federated Learning Using Flower Framework
Abstract
Federated learning provides a mechanism for different silos to collaborate, and each silo gets aid without compromising privacy. This simulation study is based on healthcare datasets, so the silos are hospitals or healthcare organizations. The selection of hospitals for federated learning increases the overall performance of the model. Cross-silo comes with many challenges, even though the number of participating clients is limited compared to cross-device federated learning. This study specifically addresses two of those aspects, heterogeneity of data and local performance. An approach called FedDSL based on ‘Datasize’, ‘Skewness’, and ‘Local Performance’ is introduced in this paper. Initially, synthetic data are generated considering the size of the data and skewness, which creates statistical heterogeneity in the cross-silo environment. Once this environment is created, a client selection strategy is applied that uses a weighted approach to select clients. A statistical analysis checks the data distributed among hospitals using skewness and normality tests. Experiments are conducted using the Flower Framework, and FedDSL is compared with random client selection. The model is applied with various aggregation algorithms, including FedAvg, FedProx, and FedAdam. The results show an increased model performance with the FedDSL approach compared to random client selection.
Keywords