Frontiers in Plant Science (Jan 2025)

Fruit size prediction of tomato cultivars using machine learning algorithms

  • Masaaki Takahashi,
  • Yasushi Kawasaki,
  • Hiroki Naito,
  • Hiroki Naito,
  • Unseok Lee,
  • Koichi Yoshi

DOI
https://doi.org/10.3389/fpls.2025.1516255
Journal volume & issue
Vol. 16

Abstract

Read online

Early fruit size prediction in greenhouse tomato (Solanum lycopersicum L.) is crucial for growers managing cultivars to reduce the yield ratio of small-sized fruit and for stakeholders in the horticultural supply chain. We aimed to develop a method for early prediction of tomato fruit size at harvest with machine learning algorithm, and three machine learning models (Ridge Regression, Extra Tree Regrreion, CatBoost Regression) were compared using the PyCaret package for Python. For constructing the models, the fruit weight estimated from the fruit diameter obtained over time for each cumulative temperature after anthesis was used as explanatory variable and the fruit weight at harvest was used as objective variable. Datasets for two different prediction periods after anthesis of three tomato cultivars (“CF Momotaro York,” “Zayda,” and “Adventure.”) were used to develop tomato size prediction models, and their performance was evaluated. We also aimed to improve the model adding the average temperature during the prediction period as an explanatory variable. When the estimated fruit size data at cumulative temperatures of 200°C d, 300°C d, and 500°C d after anthesis were used as explanatory variables, the mean absolute percentage error (MAPE) was lowest for “Zayda,” a cultivar with stable fruit diameter, at 9.8% for Ridge Regression. When the estimated fruit size at cumulative temperatures of 300°C d, 500°C d, and 800°C d after anthesis were used as explanatory variables for Ridge Regression, the MAPE decreased for all cultivars: 10.1% for “CF Momotaro York,” 8.8% for “Zayda,” and 10.0% for “Adventure.” In addition, incorporating the average temperature during the fruit size prediction period as an explanatory variable slightly increased model performance. These results indicate that this method could effectively predict tomato size at harvest in three cultivars. If fruit diameter data acquisition could be automated or simplified, it would assist in cultivation management, such as tomato thinning.

Keywords