Journal of Agriculture and Food Research (Sep 2023)

An automatic generation of pre-processing strategy combined with machine learning multivariate analysis for NIR spectral data

  • Nunik Destria Arianti,
  • Edo Saputra,
  • Agustami Sitorus

Journal volume & issue
Vol. 13
p. 100625

Abstract

Read online

Pre-processing near-infrared (NIR) spectral data is indispensable in multivariate analysis, since the measured spectra of complex samples are often subject to overwhelming background, light scattering, varying noises, and other unexpected factors. Various pre-processing methods have been developed to remove or reduce the interference of these effects. Until now, most applications of NIR spectra pre-processing in multivariate calibration have been trial-and-error, with selecting a proper method depending on the nature of the data, expertise, and practitioner experience. Thus, it is usually challenging to determine the best pre-processing method for a given data. In order to tackle these problems, this study proposes a new concept of data pre-processing, namely, automatically generating a pre-processing strategy (AGoES). This concept belongs to the ensemble pre-processing method, where machine learning algorithms (PLSR, SVM, k-NN, DT, AB, and GPR) built on differently preprocessed data are combined by 5-fold cross-validation and grid search optimization. To investigate our concept, a public NIR spectral dataset was used to predict three responses, including dry matter content (DM), organic matter content (OM) and ammonium nitrogen content (AN) from manure organic waste. The results show that SVM is the best algorithm combined with the AGoES pre-processing to predict DM and AN with a ratio of prediction to deviation (RPD) of 3.619 and 2.996, respectively. The AB tandem with AGoES pre-processing is the best strategy for predicting OM with an RPD of 3.185. Therefore, in the framework of the AGoES concept, it is unsupervised pre-processing, more simple, and feasible to apply multivariate analysis using machine learning algorithms.

Keywords