Evaluation of Various Tree-Based Ensemble Models for Estimating Solar Energy Resource Potential in Different Climatic Zones of China

Zhigao Zhou; Aiwen Lin; Lijie He; Lunche Wang

doi:10.3390/en15093463

Energies (May 2022)

Evaluation of Various Tree-Based Ensemble Models for Estimating Solar Energy Resource Potential in Different Climatic Zones of China

Zhigao Zhou,
Aiwen Lin,
Lijie He,
Lunche Wang

Affiliations

Zhigao Zhou: Shenzhen Longhua High School, Longhua District, Shenzhen 518109, China
Aiwen Lin: School of Resource and Environmental Science, Wuhan University, Wuhan 430079, China
Lijie He: College of Public Administration, Huazhong Agricultural University, Wuhan 430070, China
Lunche Wang: Laboratory of Critical Zone Evolution, School of Earth Sciences, China University of Geosciences, Wuhan 430074, China

DOI: https://doi.org/10.3390/en15093463
Journal volume & issue: Vol. 15, no. 9
p. 3463

Abstract

Read online

Solar photovoltaic (PV) electricity generation is growing rapidly in China. Accurate estimation of solar energy resource potential (Rs) is crucial for siting, designing, evaluating and optimizing PV systems. Seven types of tree-based ensemble models, including classification and regression trees (CART), extremely randomized trees (ET), random forest (RF), gradient boosting decision tree (GBDT), extreme gradient boosting (XGBoost), gradient boosting with categorical features support (CatBoost) and light gradient boosting method (LightGBM), as well as the multi-layer perceotron (MLP) and support vector machine (SVM), were applied to estimate Rs using a k-fold cross-validation method. The three newly developed models (CatBoost, LighGBM, XGBoost) and GBDT model generally outperformed the other five models with satisfactory accuracy (R2 ranging from 0.893–0.916, RMSE ranging from 1.943–2.195 MJm−2d−1, and MAE ranging from 1.457–1.646 MJm−2d−1 on average) and provided acceptable model stability (increasing the percentage in testing RMSE over training RMSE from 8.3% to 31.9%) under seven input combinations. In addition, the CatBoost (12.3 s), LightGBM (13.9 s), XGBoost (20.5 s) and GBDT (16.8 s) exhibited satisfactory computational efficiency compared with the MLP (132.1 s) and SVM (256.8 s). Comprehensively considering the model accuracy, stability and computational time, the newly developed tree-based models (CatBoost, LighGBM, XGBoost) and commonly used GBDT model were recommended for modeling Rs in contrasting climates of China and possibly similar climatic zones elsewhere around the world. This study evaluated three newly developed tree-based ensemble models of estimating Rs in various climates of China, from model accuracy, model stability and computational efficiency, which provides a new look at indicators of evaluating machine learning methods.

Published in Energies

ISSN: 1996-1073 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/energies

About the journal

Abstract

Keywords