Foods (Apr 2024)

Machine Learning Model Stability for Sub-Regional Classification of Barossa Valley Shiraz Wine Using A-TEEM Spectroscopy

  • Han Wang,
  • David W. Jeffery

DOI
https://doi.org/10.3390/foods13091376
Journal volume & issue
Vol. 13, no. 9
p. 1376

Abstract

Read online

With a view to maintaining the reputation of wine-producing regions among consumers, minimising economic losses caused by wine fraud, and achieving the purpose of data-driven terroir classification, the use of an absorbance–transmission and fluorescence excitation–emission matrix (A-TEEM) technique has shown great potential based on the molecular fingerprinting of a sample. The effects of changes in wine composition due to ageing and the stability of A-TEEM models over time had not been addressed, however, and the classification of wine blends required investigation. Thus, A-TEEM data were combined with an extreme gradient boosting discriminant analysis (XGBDA) algorithm to build classification models based on a range of Shiraz research wines (n = 217) from five Barossa Valley sub-regions over four vintages that had aged in bottle for several years. This spectral fingerprinting and machine learning approach revealed a 100% class prediction accuracy based on cross-validation (CV) model results for vintage year and 98.8% for unknown sample prediction accuracy when splitting the wine samples into training and test sets to obtain the classification models. The modelling and prediction of sub-regional production area showed a class CV prediction accuracy of 99.5% and an unknown sample prediction accuracy of 93.8% when modelling with the split dataset. Inputting a sub-set of the current A-TEEM data into the models generated previously for these Barossa sub-region wines yielded a 100% accurate prediction of vintage year for 2018–2020 wines, 92% accuracy for sub-region for 2018 wines, and 91% accuracy for sub-region using 2021 wine spectral data that were not included in the original modelling. Satisfactory results were also obtained from the modelling and prediction of blended samples for the vintages and sub-regions, which is of significance when considering the practice of wine blending.

Keywords