Atmosphere (Aug 2024)

Evaluation of Scikit-Learn Machine Learning Algorithms for Improving CMA-WSP v2.0 Solar Radiation Prediction

  • Dan Wang,
  • Yanbo Shen,
  • Dong Ye,
  • Yanchao Yang,
  • Xuanfang Da,
  • Jingyue Mo

DOI
https://doi.org/10.3390/atmos15080994
Journal volume & issue
Vol. 15, no. 8
p. 994

Abstract

Read online

This article aims to evaluate the performance of solar radiation forecasts produced by CMA-WSP v2.0 (version 2 of the China Meteorological Administration Wind and Solar Energy Prediction System) and to explore the application of machine learning algorithms from the scikit-learn Python library to improve the solar radiation prediction made by the CMA-WSP v2.0. It is found that the performance of the solar radiation forecasting from the CMA-WSP v2.0 is closely related to the weather conditions, with notable diurnal fluctuations. The mean absolute percentage error (MAPE) produced by the CMA-WSP v2.0 is approximately 74% between 11:00 and 13:00. However, the MAPE ranges from 193% to 242% at 07:00–08:00 and 17:00–18:00, which is greater than that observed at other daytime periods. The MAPE is relatively low (high) for both sunny and cloudy (overcast and rainy) conditions, with a high probability of an absolute percentage error below 25% (above 100%). The forecasts tend to underestimate (overestimate) the observed solar radiation in sunny and cloudy (overcast and rainy) conditions. By applying machine learning models (such as linear regression, decision trees, K-nearest neighbors, random forests regression, adaptive boosting, and gradient boosting regression) to revise the solar radiation forecasts, the MAPE produced by the CMA-WSP v2.0 is significantly reduced. The reduction in the MAPE is closely connected to the weather conditions. The models of K-nearest neighbors, random forests regression, and decision trees can reduce the MAPE in all weather conditions. The K-nearest neighbor model exhibits the most optimal performance among these models, particularly in rainy conditions. The random forest regression model demonstrates the second-best performance compared to that of the K-nearest neighbor model. The gradient boosting regression model has been observed to reduce the MAPE of the CMA-WSP v2.0 in all weather conditions except rainy. In contrast, the adaptive boosting (linear regression) model exhibited a diminished capacity to improve the CMA-WSP v2.0 solar radiation prediction, with a slight reduction in MAPE observed only in sunny (sunny and cloudy) conditions. In addition, the input feature selection has a considerable influence on the performance of the machine learning model. The incorporation of the time series data associated with the diurnal variation of solar radiation as an input feature can further improve the model’s performance.

Keywords