Energies (May 2022)

Evaluation of Various Tree-Based Ensemble Models for Estimating Solar Energy Resource Potential in Different Climatic Zones of China

  • Zhigao Zhou,
  • Aiwen Lin,
  • Lijie He,
  • Lunche Wang

DOI
https://doi.org/10.3390/en15093463
Journal volume & issue
Vol. 15, no. 9
p. 3463

Abstract

Read online

Solar photovoltaic (PV) electricity generation is growing rapidly in China. Accurate estimation of solar energy resource potential (Rs) is crucial for siting, designing, evaluating and optimizing PV systems. Seven types of tree-based ensemble models, including classification and regression trees (CART), extremely randomized trees (ET), random forest (RF), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost) and light gradient boosting method (LightGBM), as well as the multi-layer perceotron (MLP) and support vector machine (SVM), were applied to estimate Rs using a k-fold cross-validation method. The three newly developed models (CatBoost, LighGBM, XGBoost) and GBDT model generally outperformed the other five models with satisfactory accuracy (R2 ranging from 0.893–0.916, RMSE ranging from 1.943–2.195 MJm−2d−1, and MAE ranging from 1.457–1.646 MJm−2d−1 on average) and provided acceptable model stability (increasing the percentage in testing RMSE over training RMSE from 8.3% to 31.9%) under seven input combinations. In addition, the CatBoost (12.3 s), LightGBM (13.9 s), XGBoost (20.5 s) and GBDT (16.8 s) exhibited satisfactory computational efficiency compared with the MLP (132.1 s) and SVM (256.8 s). Comprehensively considering the model accuracy, stability and computational time, the newly developed tree-based models (CatBoost, LighGBM, XGBoost) and commonly used GBDT model were recommended for modeling Rs in contrasting climates of China and possibly similar climatic zones elsewhere around the world. This study evaluated three newly developed tree-based ensemble models of estimating Rs in various climates of China, from model accuracy, model stability and computational efficiency, which provides a new look at indicators of evaluating machine learning methods.

Keywords