IEEE Access (Jan 2025)

Explainable Climate-Based Time Series Modeling for Predicting Chemical Compositions in Tobacco Leaves

  • Hong He,
  • Yunwei Zhang,
  • Bin Li,
  • Chengjin Tao

DOI
https://doi.org/10.1109/access.2025.3564289
Journal volume & issue
Vol. 13
pp. 76352 – 76369

Abstract

Read online

The quality of tobacco leaves significantly influences cigarette flavor and market value, with chemical composition serving as a critical quality indicator. However, existing tobacco quality prediction studies mainly rely on physical leaf samples, resulting in significant time lags and limited applicability for early decision-making. To address this, we construct a $153\times 6$ climate factor matrix covering the tobacco growth period (May 1 to September 30), incorporating daily maximum temperature, minimum temperature, mean temperature, precipitation, mean sunlight intensity, and atmospheric pressure. Notably, atmospheric pressure is introduced for the first time to enhance model generalizability. We develop and compare five predictive models: multiple linear regression (MLR), eXtreme Gradient Boosting (XGBoost), long short-term memory (LSTM), gated recurrent unit (GRU), and convolutional neural network combined with LSTM (CNN+LSTM). The models are trained and validated using climate and chemical composition data from 98 tobacco-growing counties in Yunnan Province (2013-2021). Experimental results demonstrate that the CNN+LSTM model achieves superior predictive accuracy, effectively capturing complex spatiotemporal interactions in climate factors. The mean absolute percentage errors (MAPE) for total sugar, reducing sugar, and total nitrogen remain within 10%-20%, while nicotine and potassium exhibit errors in the range of 20%-30%. Furthermore, Integrated Gradients (IG) analysis is employed to interpret the CNN+LSTM model, revealing the contribution of individual climate factors to chemical accumulation patterns. Our approach improves on the time lag issue of existing studies, helps producers plan resources in advance, and provides a data-driven approach for optimizing tobacco cultivation and quality management.

Keywords