Frontiers in Plant Science (Jun 2024)

Estimation of soybean yield based on high-throughput phenotyping and machine learning

  • Xiuni Li,
  • Xiuni Li,
  • Xiuni Li,
  • Menggen Chen,
  • Menggen Chen,
  • Menggen Chen,
  • Shuyuan He,
  • Shuyuan He,
  • Shuyuan He,
  • Xiangyao Xu,
  • Xiangyao Xu,
  • Xiangyao Xu,
  • Lingxiao He,
  • Lingxiao He,
  • Lingxiao He,
  • Li Wang,
  • Li Wang,
  • Li Wang,
  • Yang Gao,
  • Yang Gao,
  • Yang Gao,
  • Fenda Tang,
  • Fenda Tang,
  • Fenda Tang,
  • Tao Gong,
  • Tao Gong,
  • Tao Gong,
  • Wenyan Wang,
  • Wenyan Wang,
  • Wenyan Wang,
  • Mei Xu,
  • Mei Xu,
  • Mei Xu,
  • Chunyan Liu,
  • Chunyan Liu,
  • Chunyan Liu,
  • Liang Yu,
  • Liang Yu,
  • Liang Yu,
  • Weiguo Liu,
  • Weiguo Liu,
  • Weiguo Liu,
  • Wenyu Yang,
  • Wenyu Yang,
  • Wenyu Yang

DOI
https://doi.org/10.3389/fpls.2024.1395760
Journal volume & issue
Vol. 15

Abstract

Read online

IntroductionSoybeans are an important crop used for food, oil, and feed. However, China’s soybean self-sufficiency is highly inadequate, with an annual import volume exceeding 80%. RGB cameras serve as powerful tools for estimating crop yield, and machine learning is a practical method based on various features, providing improved yield predictions. However, selecting different input parameters and models, specifically optimal features and model effects, significantly influences soybean yield prediction.MethodsThis study used an RGB camera to capture soybean canopy images from both the side and top perspectives during the R6 stage (pod filling stage) for 240 soybean varieties (a natural population formed by four provinces in China: Sichuan, Yunnan, Chongqing, and Guizhou). From these images, the morphological, color, and textural features of the soybeans were extracted. Subsequently, feature selection was performed on the image parameters using a Pearson correlation coefficient threshold ≥0.5. Five machine learning methods, namely, CatBoost, LightGBM, RF, GBDT, and MLP, were employed to establish soybean yield estimation models based on the individual and combined image parameters from the two perspectives extracted from RGB images.Results(1) GBDT is the optimal model for predicting soybean yield, with a test set R2 value of 0.82, an RMSE of 1.99 g/plant, and an MAE of 3.12%. (2) The fusion of multiangle and multitype indicators is conducive to improving soybean yield prediction accuracy.ConclusionTherefore, this combination of parameters extracted from RGB images via machine learning has great potential for estimating soybean yield, providing a theoretical basis and technical support for accelerating the soybean breeding process.

Keywords