Scientific Data (Feb 2025)
Mapping global yields of four major crops at 5-minute resolution from 1982 to 2015 using multi-source data and machine learning
Abstract
Abstract Accurate, historical, and continuous global crop yield data are essential for assessing risks to the global food system. However, existing datasets often have limited spatial and temporal resolution. Here, we introduce GlobalCropYield5min, a novel gridded dataset providing crop yield data for major crops — including maize, rice, wheat, and soybean — from 1982 to 2015, with a spatial resolution of 5 arc-minutes. We developed three machine learning (ML) models for each country and crop, using crop statistics from approximately 12,000 administrative units, along with satellite data, climate variables, soil properties, agricultural practices, and climate modes. The optimal predictors and ML model were selected to estimate annual crop yield for each 5 × 5 arc-minute grid cell. Results show good model performance, with R2 ranging from 0.70 to 0.95, and RMSE (NRMSE) from 0.16 t/ha (5%) to 1.1 t/ha (20%). GlobalCropYield5min outperforms other global yield datasets in spatial resolution, temporal coverage, and accuracy. This dataset is crucial for investigating climate-crop yield interactions and managing agricultural disaster risks.