Energies (Jun 2024)

Improving Photovoltaic Power Prediction: Insights through Computational Modeling and Feature Selection

  • Ahmed Faris Amiri,
  • Aissa Chouder,
  • Houcine Oudira,
  • Santiago Silvestre,
  • Sofiane Kichou

DOI
https://doi.org/10.3390/en17133078
Journal volume & issue
Vol. 17, no. 13
p. 3078

Abstract

Read online

This work identifies the most effective machine learning techniques and supervised learning models to estimate power output from photovoltaic (PV) plants precisely. The performance of various regression models is analyzed by harnessing experimental data, including Random Forest regressor, Support Vector regression (SVR), Multi-layer Perceptron regressor (MLP), Linear regressor (LR), Gradient Boosting, k-Nearest Neighbors regressor (KNN), Ridge regressor (Rr), Lasso regressor (Lsr), Polynomial regressor (Plr) and XGBoost regressor (XGB). The methodology applied starts with meticulous data preprocessing steps to ensure dataset integrity. Following the preprocessing phase, which entails eliminating missing values and outliers using Isolation Feature selection based on a correlation threshold is performed to identify relevant parameters for accurate prediction in PV systems. Subsequently, Isolation Forest is employed for outlier detection, followed by model training and evaluation using key performance metrics such as Root-Mean-Squared Error (RMSE), Normalized Root-Mean-Squared Error (NRMSE), Mean Absolute Error (MAE), and R-squared (R2), Integral Absolute Error (IAE), and Standard Deviation of the Difference (SDD). Among the models evaluated, Random Forest emerges as the top performer, highlighting promising results with an RMSE of 19.413, NRMSE of 0.048%, and an R2 score of 0.968. Furthermore, the Random Forest regressor (the best-performing model) is integrated into a MATLAB application for real-time predictions, enhancing its usability and accessibility for a wide range of applications in renewable energy.

Keywords