IEEE Access (Jan 2024)
Delay-Aware Online Resource Allocation for Buffer-Aided Synchronous Federated Learning Over Wireless Networks
Abstract
Synchronous federated learning (FL) over wireless networks often suffers from the straggler effect, when the time required for training local models and uploading trained parameters varies significantly across heterogeneous wireless devices. This disparity prolongs the duration needed for model aggregation at the data center and slows down the convergence of synchronous FL, posing a significant challenge for FL over wireless networks. In this paper, we propose a novel buffer-aided FL scheme to mitigate the straggler effect. A buffer with sufficiently large storage is deployed at each wireless device to temporarily store the collected training data and adaptively outputs it during local training, according to the computational capabilities and communication data rates of the wireless devices. Consequently, all local models can be synchronously aggregated at the data center to reduce the number of rounds required for model aggregation in FL. To ensure timely information updates, a staleness function is further introduced to characterize the freshness of the data used to train local models. Additionally, the entropic value-at-risk (EVaR) of the data queues is introduced to eliminate the impact of discarded data at the buffers and improve the accuracy of trained local models. We formulate a delay-aware online stochastic optimization problem to minimize the long-term average staleness of all wireless devices for buffer-aided FL. Our problem formulation simultaneously guarantees the stability of data queues at the wireless devices and reduces the risk of data loss. By employing the Lyapunov optimization technique, we transform the problem into instantaneous deterministic optimization subproblems and further solve each subproblem online via utilizing its hidden convexity. Simulation results demonstrate that the proposed buffer-aided synchronous FL scheme can effectively improve the convergence rate of FL and, at the same time, ensure timely synchronization of heterogeneous wireless devices.
Keywords