Remote Sensing (Mar 2020)
Estimating the Growing Stem Volume of Chinese Pine and Larch Plantations based on Fused Optical Data Using an Improved Variable Screening Method and Stacking Algorithm
Abstract
Accurately estimating growing stem volume (GSV) is very important for forest resource management. The GSV estimation is affected by remote sensing images, variable selection methods, and estimation algorithms. Optical images have been widely used for modeling key attributes of forest stands, including GSV and aboveground biomass (AGB), because of their easy availability, large coverage and related mature data processing and analysis technologies. However, the low data saturation level and the difficulty of selecting feature variables from optical images often impede the improvement of estimation accuracy. In this research, two GaoFen-2 (GF-2) images, a Landsat 8 image, and fused images created by integrating GF-2 bands with the Landsat multispectral image using the Gram–Schmidt method were first used to derive various feature variables and obtain various datasets or data scenarios. A DC-FSCK approach that integrates feature variable screening and a combination optimization procedure based on the distance correlation coefficient and k-nearest neighbors (kNN) algorithm was proposed and compared with the stepwise regression analysis (SRA) and random forest (RF) for feature variable selection. The DC-FSCK considers the self-correlation and combination effect among feature variables so that the selected variables can improve the accuracy and saturation level of GSV estimation. To validate the proposed approach, six estimation algorithms were examined and compared, including Multiple Linear Regression (MLR), kNN, Support Vector Regression (SVR), RF, eXtreme Gradient Boosting (XGBoost) and Stacking. The results showed that compared with GF-2 and Landsat 8 images, overall, the fused image (Red_Landsat) of GF-2 red band with Landsat 8 multispectral image improved the GSV estimation accuracy of Chinese pine and larch plantations. The Red_Landsat image also performed better than other fused images (Pan_Landsat, Blue_Landsat, Green_Landsat and Nir_Landsat). For most of the combinations of the datasets and estimation models, the proposed variable selection method DC-FSCK led to more accurate GSV estimates compared with SRA and RF. In addition, in most of the combinations obtained by the datasets and variable selection methods, the Stacking algorithm performed better than other estimation models. More importantly, the combination of the fused image Red_Landsat with the DC-FSCK and Stacking algorithm led to the best performance of GSV estimation with the greatest adjusted coefficients of determination, 0.8127 and 0.6047, and the smallest relative root mean square errors of 17.1% and 20.7% for Chinese pine and larch, respectively. This study provided new insights on how to choose suitable optical images, variable selection methods and optimal modeling algorithms for the GSV estimation of Chinese pine and larch plantations.
Keywords