Remote Sensing (May 2022)
Prototyping Crop Traits Retrieval Models for CHIME: Dimensionality Reduction Strategies Applied to PRISMA Data
Abstract
In preparation for new-generation imaging spectrometer missions and the accompanying unprecedented inflow of hyperspectral data, optimized models are needed to generate vegetation traits routinely. Hybrid models, combining radiative transfer models with machine learning algorithms, are preferred, however, dealing with spectral collinearity imposes an additional challenge. In this study, we analyzed two spectral dimensionality reduction methods: principal component analysis (PCA) and band ranking (BR), embedded in a hybrid workflow for the retrieval of specific leaf area (SLA), leaf area index (LAI), canopy water content (CWC), canopy chlorophyll content (CCC), the fraction of absorbed photosynthetic active radiation (FAPAR), and fractional vegetation cover (FVC). The SCOPE model was used to simulate training data sets, which were optimized with active learning. Gaussian process regression (GPR) algorithms were trained over the simulations to obtain trait-specific models. The inclusion of PCA and BR with 20 features led to the so-called GPR-20PCA and GPR-20BR models. The 20PCA models encompassed over 99.95% cumulative variance of the full spectral data, while the GPR-20BR models were based on the 20 most sensitive bands. Validation against in situ data obtained moderate to optimal results with normalized root mean squared error (NRMSE) from 13.9% (CWC) to 22.3% (CCC) for GPR-20PCA models, and NRMSE from 19.6% (CWC) to 29.1% (SLA) for GPR-20BR models. Overall, the GPR-20PCA slightly outperformed the GPR-20BR models for all six variables. To demonstrate mapping capabilities, both models were tested on a PRecursore IperSpettrale della Missione Applicativa (PRISMA) scene, spectrally resampled to Copernicus Hyperspectral Imaging Mission for the Environment (CHIME), over an agricultural test site (Jolanda di Savoia, Italy). The two strategies obtained plausible spatial patterns, and consistency between the two models was highest for FVC and LAI (R2=0.91, R2=0.86) and lowest for SLA mapping (R2=0.53). From these findings, we recommend implementing GPR-20PCA models as the most efficient strategy for the retrieval of multiple crop traits from hyperspectral data streams. Hence, this workflow will support and facilitate the preparations of traits retrieval models from the next-generation operational CHIME.
Keywords