International Journal of Applied Earth Observations and Geoinformation (Dec 2021)
Comparison of the backpropagation network and the random forest algorithm based on sampling distribution effects consideration for estimating nonphotosynthetic vegetation cover
Abstract
Non–photosynthetic vegetation (NPV) plays a crucial role in arid and semi-arid ecosystems. Remote sensing methods can extract NPV information accurately and quantitatively, which helps in studying the water use, community health, and climate response of vegetation communities. This study used the backpropagation network (BP) and random forest (RF) methods to test NPV cover extraction from Landsat 8-OLI images in Mu Us Sandy Land. Pixel-level NPV cover, photosynthetic vegetation (PV) cover, and bare soil (BS) cover from unmanned aerial vehicle (UAV) field sampling data were used to model the BP and RF. After the generalisation ability of the NPV detection model of BP and RF was evaluated using ten-fold cross-validation, the influence of the distribution of sampling data on BP and RF fitting results was also evaluated. The results were as follows: 1. Considering the selection of appropriate parameters and input layers, both BP and RF exhibited high accuracy in detecting NPV, and the detection accuracy of the RF algorithm for PV and BS was slightly higher than that of the BP algorithm (R2RF-NPV = 0.8426, R2BP-NPV = 0.8277, R2RF-PV = 0.8606, R2BP-PV = 0.8514, R2RF-BS = 0.8123, R2BP-BS = 0.7396). 2. When the BP and RF algorithms were used for geospatial continuous value prediction, the distribution of samples affected the final prediction results. The RF algorithm is less sensitive to the sample data distribution. 3. The random sampling method is the best method for collecting training samples. Even with uniform sampling, when there was a large difference between the distribution of the sampling value and the distribution of the real value, the fitting result would have a large deviation. This paper provides suggestions for the fitting of nonphotosynthetic vegetation in arid and semi-arid regions and provides a new method for evaluating the results of remote sensing regression fitting.