Journal of Integrative Agriculture (Jun 2023)
Ensemble learning prediction of soybean yields in China based on meteorological data
Abstract
The accurate prediction of soybean yield is of great significance for agricultural production, monitoring and early warning. Although previous studies have used machine learning algorithms to predict soybean yield based on meteorological data, it is not clear how different models can be used to effectively separate soybean meteorological yield from soybean yield in various regions. In addition, comprehensively integrating the advantages of various machine learning algorithms to improve the prediction accuracy through ensemble learning algorithms has not been studied in depth. This study used and analyzed various daily meteorological data and soybean yield data from 173 county-level administrative regions and meteorological stations in two principal soybean planting areas in China (Northeast China and the Huang–Huai region), covering 34 years. Three effective machine learning algorithms (K-nearest neighbor, random forest, and support vector regression) were adopted as the base-models to establish a high-precision and highly-reliable soybean meteorological yield prediction model based on the stacking ensemble learning framework. The model’s generalizability was further improved through 5-fold cross-validation, and the model was optimized by principal component analysis and hyperparametric optimization. The accuracy of the model was evaluated by using the five-year sliding prediction and four regression indicators of the 173 counties, which showed that the stacking model has higher accuracy and stronger robustness. The 5-year sliding estimations of soybean yield based on the stacking model in 173 counties showed that the prediction effect can reflect the spatiotemporal distribution of soybean yield in detail, and the mean absolute percentage error (MAPE) was less than 5%. The stacking prediction model of soybean meteorological yield provides a new approach for accurately predicting soybean yield.