Case Studies in Chemical and Environmental Engineering (Dec 2024)

Prediction accuracy of near infrared spectroscopy coupled with adaptive machine learning methods for simultaneous determination of chlorogenic acid and caffeine on intact coffee beans

  • Agus Arip Munawar,
  • Zulfahrizal,
  • Daniel Mörlein

Journal volume & issue
Vol. 10
p. 100913

Abstract

Read online

Due to the inherent complexity and high dimensionality of near infrared spectroscopy (NIRS) data, it is critical to employ robust methods that can accurately interpret the spectroscopic signals to determine the concentration of chlorogenic acid (CGA) and caffeine on intact coffee beans. Therefore, this work focuses on evaluating the accuracy of four advanced machine learning algorithms: support vector machine regression (SVMR), Ridge regression (RGR), partial least squares regression (PLSR) and extreme gradient boosting (XGBR) for the quantification of CGA and caffeine. A total of 152 NIRS spectra: 104 for calibration and 48 for prediction set, from diverse intact coffee beans were analyzed. Each algorithm was tasked with modeling the spectral data against the reference chemical assays for CGA and caffeine contents. The models were rigorously validated using independent prediction dataset, and their performance was assessed primarily based on the coefficient of determination (R2), the root means square error (RMSE), ratio prediction to deviation (RPD) and range to error ratio (RER) indexes. The primary results indicate that all four algorithms successfully quantified both CGA and caffeine from the NIRS data to varying degrees of accuracy with R2 prediction from 0.95 to 0.99 for CGA, and 0.75 to 0.97 for caffeine prediction. However, XGBR showed superior performance, yielding exceptionally high R2 values of 0.97 for caffeine and 0.99 for CGA, outperforming the other techniques. PLSR, SVMR, and RGR, despite showing commendable predictive capabilities, trailed behind the predictive accuracy achieved by XGBR algorithm.

Keywords