Agriculture (Sep 2024)

Rice Yield Estimation Using Machine Learning and Feature Selection in Hilly and Mountainous Chongqing, China

  • Li Fan,
  • Shibo Fang,
  • Jinlong Fan,
  • Yan Wang,
  • Linqing Zhan,
  • Yongkun He

DOI
https://doi.org/10.3390/agriculture14091615
Journal volume & issue
Vol. 14, no. 9
p. 1615

Abstract

Read online

To investigate effective techniques for estimating rice production in hilly and mountainous areas, in this study, we collected yield data at the field level, agro-meteorological data, and Sentinel-2/MSI remote sensing data in Chongqing, China, between 2020 and 2023. The integral values of vegetation indicators from the rice greening up to heading–filling stages were determined using the Newton–trapezoidal integration method. Using correlation analysis and importance analysis of permutation features, the effects of agro-meteorological variables and vegetation index integrals on rice yield were assessed. The chosen characteristics were then combined with three machine learning techniques—random forest (RF), support vector machine (SVM), and partial least squares regression (PLSR)—to create six rice yield estimate models. The results showed that combined vegetation indices were more effective than indices used in separate development phases. Specifically, the correlation coefficients between the integral values of eight vegetation indices from rice greening up to heading–filling stages and rice yield were all above 0.65. By introducing agro-meteorological factors as new independent variables and combining them with vegetation indices as input parameters, the predictive capability of the model was evaluated. The results showed that the performance of PLSR remained stable, while the prediction accuracies of SVM and RF improved by 13% to 21.5%. After feature selection, the inversion performance of all three machine learning models improved, with the RF model coupled with variables selected during permutation feature importance analysis achieving the optimal inversion effect, which was characterized by a coefficient of determination of 0.85, a root mean square error of 529.1 kg/hm2, and a mean relative error of 5.63%. This study provides technical support for improving the accuracy of remote sensing-based crop yield estimation in hilly and mountainous regions, facilitating precise agricultural management and informing agrarian decision making.

Keywords