Journal of Agriculture and Food Research (Dec 2024)

Explainable extreme gradient boosting as a machine learning tool for discrimination of the geographical origin of chili peppers using laser ablation-inductively coupled plasma mass spectrometry, X-ray fluorescence, and near-infrared spectroscopy

  • Seongsoo Jeong,
  • Yong-kyoung Kim,
  • Suel Hye Hur,
  • Hyojoo Bang,
  • HoJin Kim,
  • Hoeil Chung

Journal volume & issue
Vol. 18
p. 101446

Abstract

Read online

The spectroscopic discrimination of chili pepper samples according to geographical origin was executed using analytical techniques coupled with machine learning. First, laser ablation-inductively coupled plasma mass spectrometry (LA-ICP-MS), X-ray fluorescence (XRF), and near-infrared (NIR) spectroscopy were chosen for simple and rapid sample measurements. Second, to secure discrimination accuracy, eXtreme Gradient Boosting (XGBoost), a tree-based ensemble technique, was adopted as a potential classifier. Also, for explainable machine learning modeling, SHaply Additive exPlanation (SHAP) values of employed variables were calculated to assess how they contribute to the discrimination. The use of XGBoost improved discrimination accuracies in all three measurements compared to k-nearest neighbor (k-NN), support vector machine (SVM), and partial least squares-discriminant analysis (PLS-DA). The accuracy was 96.2 % using the LA-ICP-MS data. When the XRF and NIR data were combined, the accuracy improved to 97.5 %. The accuracy improvement was attributed to the combination of complementary atomic and molecular spectroscopic signatures of the samples.

Keywords