IEEE Access (Jan 2024)

FedBoost: Bayesian Estimation Based Client Selection for Federated Learning

  • Yuhang Sheng,
  • Lingguo Zeng,
  • Shuqin Cao,
  • Qing Dai,
  • Shasha Yang,
  • Jianfeng Lu

DOI
https://doi.org/10.1109/ACCESS.2024.3359251
Journal volume & issue
Vol. 12
pp. 52255 – 52266

Abstract

Read online

Although federated learning (FL) represents a distributed machine learning paradigm that ensures privacy protection, the failure of stragglers to upload local models in a timely manner results in an overall degradation of the global model’s performance, and the difficulty of accurately predicting whether clients will succeed in uploading a local model makes client selection still a challenge. To address this issue, existing works mainly focus on increasing the number of clients who participate in training within a fixed time, however, the fact is that the performance of a global model depends on the data used for training. Therefore, increasing the clients’ data contribution to the global model can effectively enhance the global model’s performance. To this end, we propose a Bayesian estimation based FL framework, named FedBoost, to enhance the performance of the model when straggler problem exists. Specifically, we formulate a long-term problem aimed at maximizing clients’ cumulative effective data contributions, while satisfying a long-term fairness constraints, which ensure a minimum selection frequency for clients. By analyzing the stability of virtual queues, we transform the long-term problem into a stepwise one via Lyapunov optimization, reducing its computational complexity. Due to the inability of the server to predict whether clients successfully upload the local model before receiving the actual upload, we use Bayesian estimation based on the observed frequency of successful uploads to estimate this probability. Last, extensive experimental results indicate that the average test accuracy of our FedBoost is up to 5.59% higher than both FedAvg and FedCS on three real-world datasets, and achieves test loss that are at most 0.1646 below the two baselines. Furthermore, the value of Lapunov function remains lower than 1.4, and at least 85% of the estimation of probabilities are in a reasonable range.

Keywords