International Journal of Applied Earth Observations and Geoinformation (Feb 2025)

The impact of spatiotemporal variability of environmental conditions on wheat yield forecasting using remote sensing data and machine learning

  • Keltoum Khechba,
  • Mariana Belgiu,
  • Ahmed Laamrani,
  • Alfred Stein,
  • Abdelhakim Amazirh,
  • Abdelghani Chehbouni

Journal volume & issue
Vol. 136
p. 104367

Abstract

Read online

Climate change poses significant challenges to food security, especially in semi-arid agriculture areas. Effective monitoring of crop yield is important for establishing food emergency responses and developing long-term sustainable strategies. In Morocco, where cereals are the predominant crops, yield forecasting is important for addressing the yield gap as it enables farmers to take preventive actions before the harvesting period. This study aims to assess the impact of spatial and temporal heterogeneity of environmental conditions on wheat yield forecasting using machine learning models. It compares the 2019–2020 and 2020–2021 agricultural seasons using three sets of variables: (1) spectral indices; (2) weather data; and (3) a combination of both spectral indices and weather data. Weather data, including cumulative monthly precipitation from ERA5 data and average monthly temperature from PERSIANN data, were extracted for the wheat growing season (November to June). Spectral indices including the Normalized Difference Vegetation Index, Moisture Stress Index, and Terrestrial Chlorophyll Index were calculated from Sentinel-2 imagery for the same period and processed using Google Earth Engine. The study area was divided into homogeneous zones based on an existing landform classification, and XGBoost and Random Forest (RF) models were used for yield forecasting in each zone separately. The two models performed equally well across both the zones and the whole study area (SA) when using weather data as the input variable. For instance, across SA, they achieved average R2 values of 0.60 and 0.81 for all months during the 2019–2020 and 2020–2021 agricultural seasons, respectively. However, when using spectral indices or combining these indices with weather data, RF consistently outperformed XGBoost. For example, in SA during the 2019–2020 season, RF achieved an average R2 of 0.48 across the growing season, compared to XGBoost’s R2 of 0.43. Similarly, in the 2020–2021 season, RF achieved an R2 of 0.35 and an RMSE of 1083 kg ha-1, while XGBoost performed slightly lower, with an R2 of 0.29 and an RMSE of 1137 kg ha-1. Comparing the prediction accuracy between the seasons for each set of variables, the RF model performs better when using spectral indices during the relatively dry 2019–2020 season as compared to the wet 2020–2021 season. Incorporating weather data, the model improved its performance for the 2020–2021 season. April showed the highest prediction performance overall, with R2 values of 0.6 for SA using weather data alone in the 2019–2020 season, and 0.8 for SA using a combination of weather data and spectral indices in the 2020–2021 season. The 2019–2020 season showed strong fluctuations in accuracy throughout the growing season, whereas the 2020–2021 season had a consistent improvement in accuracy over time. These variations in accuracy are due to differing environmental conditions that should be taken into account for making better and more reliable yield predictions.

Keywords