Information Processing in Agriculture (Dec 2024)
Plot level sugarcane yield estimation by machine learning on multispectral images: A case study of Bundaberg, Australia
Abstract
Early crop yield prediction provides critical information for Precision Agriculture (PA) procedures, policymaking, and food security. The availability of Remote Sensing (RS) datasets and Machine Learning (ML) approaches improved the prediction of sugarcane crop yield on the local and global scales, but an additional effort on the plot scale prediction is required. Challenges for plot-level prediction include a high ratooning capacity of the sugarcane crop, the lack of high spatial resolution data during the critical growth stages, and the non-linear complexation of yield data. The principal objective of the study is to analyse the potential of a time series of high-resolution multispectral Unmanned Aerial Vehicle (UAV) imagery along with three advanced ML techniques, namely Random Forest Regression (RFR), Support Vector Regression (SVR), and Nonlinear Autoregressive Exogenous Artificial Neural Network (NARX ANN) as a solution to the plot-level sugarcane yield prediction. An experimental sugarcane field containing 48 plots was selected, and UAV imagery was collected during the three consecutive cropping seasons' early and middle crop growth stages. Each dataset per growth stage was analyzed separately to predict the sugarcane crop yield in an attempt to discover how early the prediction of pre-harvest yield can be achieved. The datasets of the first two cropping seasons were trained and tested using the three ML techniques, utilizing 10-fold cross-validation to avoid overfitting. The third cropping season dataset was then used to evaluate the reliability of the developed prediction models. The results show that the correlation of Vegetation Indices (VIs) with crop yield in the middle stage outperforms the early stage in all three ML models. Moreover, comparing these models indicates that the NARX ANN method outperformed the others in the middle stage with the highest correlation coefficient (R2) of 0.96 and the lowest Root Mean Square Error (RMSE) of 4.92 t/ha. It was followed by the SVR (R2 = 0.52, RMSE of 14.85 t/ha), which performed similarly to the RFR method (R2 = 0.48, RMSE = 11.20 t/ha). In conclusion, the best-suited model for predicting sugarcane yields during the middle growth stage is a NARX ANN model employing the Normalized Difference RedEdge (NDRE), which demonstrates the feasibility of the ML approaches to predict the plot level sugarcane yield at a specific period of growth as they are less sensitive to the inconsistency of data collection times.