International Journal of Applied Earth Observations and Geoinformation (Sep 2021)
Modeling tree canopy height using machine learning over mixed vegetation landscapes
Abstract
Although the random forest algorithm has been widely applied to remotely sensed data to predict characteristics of forests, such as tree canopy height, the effect of spatial non-stationarity in the modeling process is oftentimes neglected. Previous studies have proposed methods to address the spatial variance at local scales, but few have explored the spatial autocorrelation pattern of residuals in modeling tree canopy height or investigated the relationship between canopy height and model performance. By combining Light Detection and Ranging (LiDAR) and Landsat datasets, we used spatially-weighted geographical random forest (GRF) and traditional random forest (TRF) methods to predict tree canopy height in a mixed dry forest woodland in complex mountainous terrain. Comparisons between TRF and GRF models show that the latter can lower predefined extreme residuals, and thus make the model performance relatively stronger. Moreover, the relationship between model performance and degree of variation of true canopy height can vary considerably within different height quantiles. Both models are likely to present underestimates and overestimates when the corresponding tree canopy heights are high (>95% quantile) and low (<median), respectively. This study provides a critical insight into the relationship between tree canopy height and predictive abilities of random forest models when taking account of spatial non-stationarity. Conclusions indicate that a trade-off approach based on the actual need of project should be taken when selecting an optimal model integrating both local and global effects in modeling attributes such as canopy height from remotely sensed data.