Forest Science and Technology (Sep 2024)
Evaluation of statistical and machine learning models using satellite data to estimate aboveground biomass: A study in Vietnam Tropical Forests
Abstract
The combination of machine learning models with satellite imagery is becoming a popular data-modeling tool for biomass prediction, supporting land cover management. This study aims to select the most suitable model to estimate tropical forest aboveground biomass in Vietnam, helping to manage and monitor changes in biomass at regional and local scales. The study identified the optimal model for estimating forest aboveground biomass and minimizing the number of input variables while achieving satisfactory model performance. A total of 59 input variables, including topography, texture features, and vegetation indices, from satellite data were used in four non-parametric algorithms and a conventional parametric model, Artificial Neural Networks (ANN), Support Vector Machine (SVM), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Multiple Linear Regression (MLR) to predict biomass and evaluate changes aboveground biomass over 10 years in two tropical forests in Vietnam. The results indicated that all models had good estimation performance with R2 ranging from 0.615 to 0.754. For RF, MLR, and XGBoost, vegetation indices contributed the highest model weights, occupying 77.71% – 92.48%. For ANN and SVM, textural and topographic features were the majority of the model weights (73.74 – 96.36%). The RF model performed the best using 59 variables (R2 = 0.754, MAE = 78.5 Mg·ha−1, and %RMSE = 13.57%) and ten variables (R2 = 0.745, MAE = 85.8 Mg·ha−1, and %RMSE = 16.17%). The biomass map using the RF and ten variables achieved a good degree of fitting of 0.76, so it was suitable for managing and monitoring forest biomass in Vietnam. The results indicated a sharp decrease in the areas of dense and very dense forests from 2013 to 2021 and a gradual increase in 2023.
Keywords