Remote Sensing (Feb 2019)

A Comparison of Hybrid Machine Learning Algorithms for the Retrieval of Wheat Biophysical Variables from Sentinel-2

  • Deepak Upreti,
  • Wenjiang Huang,
  • Weiping Kong,
  • Simone Pascucci,
  • Stefano Pignatti,
  • Xianfeng Zhou,
  • Huichun Ye,
  • Raffaele Casa

DOI
https://doi.org/10.3390/rs11050481
Journal volume & issue
Vol. 11, no. 5
p. 481

Abstract

Read online

This study focuses on the comparison of hybrid methods of estimation of biophysical variables such as leaf area index (LAI), leaf chlorophyll content (LCC), fraction of absorbed photosynthetically active radiation (FAPAR), fraction of vegetation cover (FVC), and canopy chlorophyll content (CCC) from Sentinel-2 satellite data. Different machine learning algorithms were trained with simulated spectra generated by the physically-based radiative transfer model PROSAIL and subsequently applied to Sentinel-2 reflectance spectra. The algorithms were assessed against a standard operational approach, i.e., the European Space Agency (ESA) Sentinel Application Platform (SNAP) toolbox, based on neural networks. Since kernel-based algorithms have a heavy computational cost when trained with large datasets, an active learning (AL) strategy was explored to try to alleviate this issue. Validation was carried out using ground data from two study sites: one in Shunyi (China) and the other in Maccarese (Italy). In general, the performance of the algorithms was consistent for the two study sites, though a different level of accuracy was found between the two sites, possibly due to slightly different ground sampling protocols and the range and variability of the values of the biophysical variables in the two ground datasets. For LAI estimation, the best ground validation results were obtained for both sites using least squares linear regression (LSLR) and partial least squares regression, with the best performances values of R2 of 0.78, rott mean squared error (RMSE) of 0.68 m2 m−2 and a relative RMSE (RRMSE) of 19.48% obtained in the Maccarese site with LSLR. The best results for LCC were obtained using Random Forest Tree Bagger (RFTB) and Bagging Trees (BagT) with the best performances obtained in Maccarese using RFTB (R2 = 0.26, RMSE = 8.88 μg cm−2, RRMSE = 17.43%). Gaussian Process Regression (GPR) was the best algorithm for all variables only in the cross-validation phase, but not in the ground validation, where it ranked as the best only for FVC in Maccarese (R2 = 0.90, RMSE = 0.08, RRMSE = 9.86%). It was found that the AL strategy was more efficient than the random selection of samples for training the GPR algorithm.

Keywords