Geoderma (Jan 2025)
Simultaneous estimation of multiple soil properties from vis-NIR spectra using a multi-gate mixture-of-experts with data augmentation
Abstract
Simultaneous estimation of multiple soil properties from vis-NIR hyperspectra presents a cost-effective and time-efficient approach. Previous studies have utilized multi-task convolutional neural network (multi-CNN) with share-bottom structures based on the hard parameter sharing. However, multi-CNN often ignores the differential characteristics of correlations between soil properties, limiting the accuracy of soil property estimation. The multi-gate mixture-of-experts network (MMoE) offers a solution by extracting both common features across all soil properties and unique features specific to each soil property, which probably could provide better estimation outcomes than the conventional shared-bottom multi-CNN. In the present study, a MMoE was built based on a total of 17,272 mineral soil samples from the Land Use/Cover Area Frame Survey (LUCAS) topsoil database that includes vis-NIR spectra with ten physicochemical properties, i.e., clay, silt, sand, pH (in water), organic content (OC), calcium carbonate (CaCO3), nitrogen (N), phosphorous (P), potassium (K), and cation exchange capacity (CEC). To evaluate the performance of MMoE, a series of other models were also built, i.e., partial least square regression (PLSR), single-task convolutional neural network (single-CNN), multi-task convolutional neural network (multi-CNN) and multi-task long short-term memory (multi-LSTM). Furthermore, performance of feature-spectrum selected by competitive adaptive reweighted sampling (CARS) on the accuracy of the MMoE was also explored, as well as a data augmentation method of stacking raw spectra with five preprocessed spectra data. The results demonstrated that MMoE had higher accuracy than PLSR, single-CNN, and multi-LSTM models, with RMSE reduction of 5 %–48 %, R2 improvement of 1 %–119 %, and CCC improvement of 0 %–74 %. Compared with multi-CNN, MMoE showed better accuracy for all properties except pH, with RMSE reduction of 3 %–8 %, R2 improvement of 1 %–12 %, and CCC improvement of 0 %–5 %. However, the feature-spectrum selected by CARS did not improve the accuracy of MMoE compared to full-band spectrum, whereas the data augmentation method was effective in improving the estimation accuracy of MMoE compared to raw spectra, with RMSE reduction of 14 %–28 %, R2 improvement of 3 %–88 %, and CCC improvement of 1 %–63 %. Consequently, this study proves that MMoE based on data augmentation is an efficient and accurate method for the simultaneous estimation of multiple soil properties from vis-NIR spectra.