Scientific Reports (Jan 2025)

Efficient and accurate determination of the degree of substitution of cellulose acetate using ATR-FTIR spectroscopy and machine learning

  • Frank Rhein,
  • Timo Sehn,
  • Michael A. R. Meier

DOI
https://doi.org/10.1038/s41598-025-86378-0
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Multiple linear regression models were trained to predict the degree of substitution (DS) of cellulose acetate based on raw infrared (IR) spectroscopic data. A repeated k-fold cross validation ensured unbiased assessment of model accuracy. Using the DS obtained from 1H NMR data as reference, the machine learning model achieved a mean absolute error (MAE) of 0.069 in DS on test data, demonstrating higher accuracy compared to the manual evaluation based on peak integration. Limiting the model to physically relevant areas unexpectedly showed the $${\hbox {C}{-}\hbox {H}}$$ peak to be the strongest predictor of DS. By applying a n-best feature selection algorithm based on the F-statistic of the Pearson correlation coefficient, several relevant areas were identified and the optimized model achieved an improved MAE of 0.052. Predicting the DS of other cellulose acetate data sets yielded similar accuracy, demonstrating that the developed models are robust and suitable for efficient and accurate routine evaluations. The model solely trained on cellulose acetate was further able to predict the DS of other cellulose esters with an accuracy of $$\approx 0.1-0.2$$ in DS and model architectures for a more general analysis of cellulose esters were proposed.

Keywords