IEEE Access (Jan 2024)
Revealing the Effects of Data Heterogeneity in Federated Learning Regression Models for Short-Term Solar Power Forecasting
Abstract
Accurate short-term power forecasting is crucial for the successful commercialization of solar energy, helping to prevent financial losses in energy markets. Federated learning (FL) offers a promising approach for power forecasting with small databases, enabling marketers to collaboratively develop models without exchanging sensitive data. An initial simulation of this FL use case reveals a clear discrepancy between the solar power forecasting performance of the FL model and central learning (CL) models trained on each marketer’s data. While some benefit from participating in FL, others achieve far better results with individually trained forecasting models. This paper demonstrates that the discrepancy is due to heterogeneity between the data of each marketer. As this is a poorly researched area in FL regression, a taxonomy of heterogeneity characteristics (skews) is first proposed. Next the influence of the skews are analyzed conducting a simulation study. The study has shown that differences in autocorrelation and number of data points among time series from FL clients have the greatest impact on the performance of the FL regression model. A first heterogeneity quantification approach based on ARMA-GARCH models is proposed to address this problem in the development of federated solar power forecasting systems and other federated regression problems. In addition, the paper highlights the need for further investigation of heterogeneity in federated regression problems.
Keywords