Applied Sciences (Aug 2022)

Averaging and Stacking Partial Least Squares Regression Models to Predict the Chemical Compositions and the Nutritive Values of Forages from Spectral Near Infrared Data

  • Mathieu Lesnoff,
  • Donato Andueza,
  • Charlène Barotin,
  • Philippe Barre,
  • Laurent Bonnal,
  • Juan Antonio Fernández Pierna,
  • Fabienne Picard,
  • Philippe Vermeulen,
  • Jean-Michel Roger

DOI
https://doi.org/10.3390/app12157850
Journal volume & issue
Vol. 12, no. 15
p. 7850

Abstract

Read online

Partial least square regression (PLSR) is a reference statistical model in chemometrics. In agronomy, it is used to predict components (response variables y) of chemical composition of vegetal materials from spectral near infrared (NIR) data X collected from spectrometers. PLSR reduces the dimension of the spectral data X by defining vectors that are then used as latent variables (LVs) in a multiple linear model. One difficulty is to determine the relevant dimensionality (number of LVs) for the given data. This step can be very time consuming when many datasets have to be processed and/or the datasets are frequently updated. The paper focuses on an alternative, bypassing the determination of the PLSR dimensionality and allowing for automatizing the predictions. The strategy uses ensemble learning methods, such as averaging or stacking the predictions of a set of PLSR models with different dimensionalities. The paper presents various methods of PLSR averaging and stacking and compares their performances to the usual PLSR on six real datasets on different types of forages. The main finding of the study was the overall superiority of the averaging methods compared to the usual PLSR. We therefore believe that such methods can be recommended to analyze NIR data on forages.

Keywords