Fermentation (Mar 2023)

Machine Learning Models Using Data Mining for Biomass Production from <i>Yarrowia lipolytica</i> Fermentation

  • Nattha Pensupa,
  • Treesukon Treebuppachartsakul,
  • Suejit Pechprasarn

DOI
https://doi.org/10.3390/fermentation9030239
Journal volume & issue
Vol. 9, no. 3
p. 239

Abstract

Read online

In this paper, a database of biomass production from Yarrowia lipolytica fermentation is prepared and constructed using machine learning and data mining approaches. The database is curated from 15 publications and consists of 301 rows of data with 25 predictors and 1 label. The predictors include inoculum size, temperature, pH, and time, while the label is the corresponding biomass production. The database is then divided into training, validation, and test datasets and analyzed as a supervised machine learning task for regression. Twenty-six regression models are employed and compared for their performance in predicting biomass production. The best-performing model is the Matern 5/2 Gaussian process regression model, which has the lowest root-mean-squared error of 0.75 g/L, the highest R squared of 0.90, and the lowest mean absolute error of 0.52 g/L. The t-test is used to identify the most important predictors, and 14 predictors are sufficient for creating an accurate model. These 14 predictors are fermentation time, peptone, temperature, total Kjeldahl nitrogen, shaking rate, total nitrogen, inoculum size, yeast extract, crude glycerol, glucose, oil and grease, media pH, ammonium sulfate, and olive oil. This research demonstrates the application of machine learning and data mining to estimate biomass production and gives insight into which parameters are essential for Yarrowia lipolytica fermentation.

Keywords